发明名称 System and method for analyzing data records
摘要 A method and system for analyzing data records includes allocating groups of records to respective processes of a first plurality of processes executing in parallel. In each respective process of the first plurality of processes, for each record in the group of records allocated to the respective process, a query is applied to the record so as to produce zero or more values. Zero or more emit operators are applied to each of the zero or more produced values so as to add corresponding information to an intermediate data structure. Information from a plurality of the intermediate data structures is aggregated to produce output data.
申请公布号 US9405808(B2) 申请公布日期 2016.08.02
申请号 US201213407632 申请日期 2012.02.28
申请人 GOOGLE INC. 发明人 Pike Robert C.;Quinlan Sean;Dorward Sean M.;Dean Jeffrey;Ghemawat Sanjay
分类号 G06F17/30;G06F11/14 主分类号 G06F17/30
代理机构 Morgan, Lewis & Bockius LLP 代理人 Morgan, Lewis & Bockius LLP
主权项 1. A computer-implemented method of processing a plurality of data records, performed on a system having one or more processors and memory storing one or more programs for execution by the one or more processors to perform the computer-implemented method, comprising: allocating subgroups of the plurality of data records to respective processes of a first plurality of processes; after the allocating, executing in parallel, in each respective process of the first plurality of processes, application-specific and application-independent operations comprising: for at least one data record in at least a subset of the subgroups of data records allocated to the respective process: extracting information from the at least one data record, by using one or more application-specific data processing operators provided by an application programmer;applying a multi-step script comprising a plurality of information processing commands applied sequentially to the extracted information to produce one or more values, wherein at least one step in the multi-step script includes selecting a respective application-independent emit operator on an application-specific basis and applying the respective application-independent emit operator to the information extracted from the at least one data record; andstoring the one or more values in one or more intermediate data structures in a plurality of intermediate data structures; and in each process of a second plurality of processes, aggregating values from a subset of the plurality of intermediate data structures to produce output data.
地址 Mountain View CA US