发明名称 Parallel processing of data
摘要 A data parallel pipeline may specify multiple parallel data objects that contain multiple elements and multiple parallel operations that operate on the parallel data objects. Based on the data parallel pipeline, a dataflow graph of deferred parallel data objects and deferred parallel operations corresponding to the data parallel pipeline may be generated and one or more graph transformations may be applied to the dataflow graph to generate a revised dataflow graph that includes one or more of the deferred parallel data objects and deferred, combined parallel data operations. The deferred, combined parallel operations may be executed to produce materialized parallel data objects corresponding to the deferred parallel data objects.
申请公布号 US9626202(B2) 申请公布日期 2017.04.18
申请号 US201514622556 申请日期 2015.02.13
申请人 Google Inc. 发明人 Chambers Craig D.;Raniwala Ashish;Perry Frances J.;Adams Stephen R.;Henry Robert R.;Bradshaw Robert;Weizenbaum Nathan
分类号 G06F9/44;G06F9/46;G06F9/45;G06F9/455;G06F21/62;G06F9/38;G06F9/48;G06F9/445;G06F17/30;G06F9/30 主分类号 G06F9/44
代理机构 Fish & Richardson P.C. 代理人 Fish & Richardson P.C.
主权项 1. A computer-implemented method executed by one or more processors, the method comprising: executing a deferred operation included in a dataflow graph for a data parallel pipeline to produce materialized data objects corresponding to the deferred operation, including: determining an estimated size of data associated with the deferred operation;determining that the estimated size exceeds a data size threshold;in response to determining that the estimated size exceeds the data size threshold, executing the deferred operation as a remote parallel operation with the one or more processors, wherein determining the dataflow graph for the data parallel pipeline includes analyzing sequential programming language instructions associated with the data parallel pipeline, and wherein the materialized data objects are configured to be accessed during execution of a program corresponding to the sequential programming language instructions associated with the data parallel pipeline.
地址 Mountain View CA US