System and method for large-scale data processing using an application-independent framework,申请号US201314099806-传众专利搜索

发明名称	System and method for large-scale data processing using an application-independent framework
摘要	A large-scale data processing system and method for processing data in a distributed and parallel processing environment is disclosed. The system comprises a set of interconnected computing systems, each having one or more processors and memory. The set of interconnected computing systems include: a set of application-independent map modules for reading portions of input files containing data, and for producing intermediate data values by applying at least one user-specified, application-specific map operation to the data; a set of intermediate data structures distributed among a plurality of the interconnected computing systems for storing the intermediate data values; and a set of application-independent reduce modules, distinct from the plurality of application-independent map modules, for producing final output data by applying at least one user-specified, application-specific reduce operation to the intermediate data values.
申请公布号	US9612883(B2)	申请公布日期	2017.04.04
申请号	US201314099806	申请日期	2013.12.06
申请人	Google Inc.	发明人	Dean Jeffrey;Ghemawat Sanjay
分类号	G06F17/30;G06F9/54;G06F9/48	主分类号	G06F17/30
代理机构	Morgan, Lewis & Bockius LLP	代理人	Morgan, Lewis & Bockius LLP
主权项	1. A system for large-scale processing of data in a distributed and parallel processing environment, comprising: a set of interconnected computing systems, each having one or more processors and memory, the set of interconnected computing systems including: a plurality of worker processes executing on the set of interconnected computing systems;an application-independent supervisory process executing on the set of interconnected computing systems, for: determining, for input files, a plurality of data processing tasks including a plurality of map tasks specifying data from the input files to be processed into intermediate data values and a plurality of reduce tasks specifying intermediate data values to be processed into final output data; andassigning the data processing tasks to idle ones of the worker processes;a set of application-independent map functions, executed by a first subset of the plurality of worker processes, for reading portions of the input files containing data, and for producing intermediate data values by applying at least one user-specified, application-specific map operation to the data, wherein the set of application-independent map functions are independent of the at least one user-specified, application-specific map operation;a set of intermediate data structures distributed among a plurality of the interconnected computing systems for storing the intermediate data values; anda set of application-independent reduce functions, distinct from the set of application-independent map functions, the set of application-independent reduce functions executed by a second subset of the plurality of worker processes for producing the final output data by applying at least one user-specified, application-specific reduce operation to the intermediate data values, wherein the set of application-independent reduce functions are independent of the at least one user-specified, application-specific reduce operation.
地址	Mountain View CA US