发明名称 Dynamic tree determination for data processing
摘要 Data can be processed in parallel across a cluster of nodes using a parallel processing framework. Using Web services calls between components allows the number of nodes to be scaled as necessary, and allows developers to build applications on the framework using a Web services interface. A job scheduler works together with a queuing service to distribute jobs to nodes as the nodes have capacity, such that jobs can be performed in parallel as quickly as the nodes are able to process the jobs. Data can be loaded efficiently across the cluster, and levels of nodes can be determined dynamically to process queries and other requests on the system.
申请公布号 US9063976(B1) 申请公布日期 2015.06.23
申请号 US201314107570 申请日期 2013.12.16
申请人 Amazon Technologies, Inc. 发明人 Bacthavachalu Govindaswamy;Gavares Peter Grant;Badran Ahmed A.;Scharf, Jr. James E.
分类号 G06F17/30 主分类号 G06F17/30
代理机构 Kilpatrick Townsend & Stockton LLP 代理人 Kilpatrick Townsend & Stockton LLP
主权项 1. A computer-implemented method, comprising: specifying, by a computer system, a first set of nodes in a group of nodes to process a first portion of a request; determining a second set of nodes in the group of nodes to process a second portion of the request, wherein: a size of the second set of nodes is based at least in part on a predicted number of first results for the first portion of the request,the first set of nodes and the second set of nodes are operable to form a hierarchy such that at least one node of the second set of nodes is a parent node to at least one child node in the first set of nodes, the at least one node of the second set of nodes being operable to process, as part of the second portion of the request, a result of the first portion of the request from the at least one child node in the first set of nodes,the second set of nodes are operable to have a plurality of hierarchical levels such that one or more parent nodes of the second set of nodes processes results of corresponding one or more child nodes in the second set of nodes, the number of the plurality of hierarchical levels being based at least in part on the predicted number of first results and an expected capacity of the second nodes; scheduling, by the computer system, a job to be executed on a single node of the group of nodes, the job corresponding to at least one of the first portion of the request or the second portion of the request; and causing at least a subset of the second set of nodes to process the second portion of the request when a respective portion of the first set of nodes finishes processing the first portion of the request, at least a portion of the second set of nodes operable to process the second portion in parallel.
地址 Reno NV US