发明名称 System and Method For Analyzing Data Records
摘要 A method processes data records. The method partitions the data records into groups and assigns each group to a respective process of a first plurality of processes, which execute in parallel. For each group, the assigned process extracts information from the data records, applies a script with information processing commands applied sequentially to produce intermediate values, stores the intermediate values in a respective intermediate data structure, and updates the status of the group to indicate completion. When the predefined threshold percentage of the data records are completed, the process assigns each group to a respective second process as a backup. When each of the groups has been completed by at least one process (either the original or the backup), the method executes a second plurality of processes to aggregate intermediate values from the intermediate data structures to produce output data. The aggregation includes intermediate values only once for each group.
申请公布号 US2016342657(A1) 申请公布日期 2016.11.24
申请号 US201615226795 申请日期 2016.08.02
申请人 GOOGLE INC. 发明人 Pike Robert C.;Quinlan Sean;Dorward Sean M.;Dean Jeffrey;Ghemawat Sanjay
分类号 G06F17/30 主分类号 G06F17/30
代理机构 代理人
主权项 1. A computer-implemented method of processing a plurality of data records, performed on a system having one or more processors and memory storing one or more programs for execution by the one or more processors to perform the computer-implemented method, comprising: partitioning the plurality of data records into groups and assigning each group of data record to a respective process of a first plurality of processes; executing the first plurality of processes in parallel, wherein for each group the assigned process: extracts information from the data records in the group;applies a multi-step script comprising a plurality of information processing commands applied sequentially to the extracted information to produce one or more intermediate values;stores the one or more intermediate values in a respective intermediate data structure; andupdates a status of the group to indicate completion; determining whether a predefined threshold percentage of the data records are completed based on the status updates provided by the processes; when it is determined that the predefined threshold percentage of the data records are completed, assigning each group to a respective second process of the first plurality of processes; when it is determined that each of the groups has been completed by at least one process, executing a second plurality of processes to aggregate intermediate values from the intermediate data structures to produce output data, wherein the aggregation includes intermediate values only once for each group.
地址 MOUNTAIN VIEW CA US