发明名称 PARALLEL PROCESSING FOR ETL PROCESSES
摘要 A technique for parallel processing of data from a plurality of data sources in conjunction with an Extract-Transform-Load (ETL) process, the data being part of a related data set, which comprises the following: staging a unit of extracted data from each of the plurality of data sources, thereby generating a plurality of units of staged data; identifying a plurality of tasks relating to transforming the staged data; assigning a subset of the tasks to each of a plurality of child processes being managed by a master process, such that dependent tasks are assigned to a same child process; concurrently executing the subsets of tasks assigned to the child processes, thereby generating a plurality of units of transformed data from the plurality of units of staged data; and publishing the transformed data after all tasks are completely executed, thereby ensuring that the published data represent the related data set.
申请公布号 US2008222634(A1) 申请公布日期 2008.09.11
申请号 US20070682815 申请日期 2007.03.06
申请人 YAHOO! INC. 发明人 RUSTAGI AMIT
分类号 G06F9/46 主分类号 G06F9/46
代理机构 代理人
主权项
地址