发明名称 Evolving parallel system to automatically improve the performance of multiple concurrent tasks on large datasets
摘要 We describe a high-level computational framework especially well suited to parallel operations on large datasets. In a system in accordance with this framework, there is at least one, and generally several, instances of an architecture deployment as further described. We use the term “architecture deployment” herein to mean a cooperating group of processes together with the hardware on which the processes are executed. This is not to imply a one-to-one association of any process to particular hardware. To the contrary, as detailed below, an architecture deployment may dynamically spawn another deployment as appropriate, including provisioning needed hardware. The active architecture deployments together form a system that dynamically processes jobs requested by a user-customer, in accordance with customer's monetary budget and other criteria, in a robust and automatically scalable environment.
申请公布号 US9098326(B1) 申请公布日期 2015.08.04
申请号 US201213673502 申请日期 2012.11.09
申请人 BigML, Inc. 发明人 Martin Francisco J.;Ashenfelter Adam;Donaldson J. Justin;Verwoerd Jos;Ortega Jose Antonio;Parker Charles
分类号 G06F9/46;G06F9/50 主分类号 G06F9/46
代理机构 Stolowitz Ford Cowger LLP 代理人 Stolowitz Ford Cowger LLP
主权项 1. A computer-implemented method of processing tree models corresponding to user data of plural users, the method comprising the steps of: deploying a first architecture deployment instance, the first architecture deployment instance storing a respective tree model processing budget for each one of plural users against which tree model processing costs for the corresponding user are applied, providing a user interface process with which each user uploads its user data, providing a data analysis process to analyze each user's data and convert it into a corresponding dataset, and providing a model builder process to construct a corresponding decision tree model based on the dataset of each user, the first architecture deployment instance employing first computing resources; wherein providing the model builder process includes for a selected dataset, (a) distributing plural partitions of the selected dataset from a master process to plural respective worker processes; (b) until a predetermined tree model building criterion is met by a finished first tree model, distributing a first tree model of the selected dataset from the master process to the plural worker processes; (c) processing the respective partition of the selected dataset at each worker process with the first tree model to obtain a local tree model result; and (d) updating the first tree model at the master process according to one or more of the local tree model results and returning to step (b), in an iterative fashion, wherein updating the first tree model comprises growing an additional layer of the tree model in each iteration; and wherein updating the first tree model at the master process according to one or more of the local tree model results includes, at each worker process, compressing its local results into a series of histograms, one histogram for each input variable of the dataset, and transmitting the histograms to the master process.
地址 Corvallis OR US