摘要 |
A technique for parallel processing of data from a plurality of data sources in conjunction with an Extract-Transform-Load (ETL) process, the data being part of a related data set, which comprises the following: staging a unit of extracted data from each of the plurality of data sources, thereby generating a plurality of units of staged data; identifying a plurality of tasks relating to transforming the staged data; assigning a subset of the tasks to each of a plurality of child processes being managed by a master process, such that dependent tasks are assigned to a same child process; concurrently executing the subsets of tasks assigned to the child processes, thereby generating a plurality of units of transformed data from the plurality of units of staged data; and publishing the transformed data after all tasks are completely executed, thereby ensuring that the published data represent the related data set.
|