发明名称 Task-based modeling for parallel data integration
摘要 System, method, and computer program product to perform an operation for task-based modeling for parallel data integration, by determining, for a data flow, a set of processing units, each of the set of processing units defining one or more data processing operations to process the data flow, generating a set of tasks to represent the set of processing units, each task in the set of tasks comprising one or more of the data processing operations of the set of processing units, optimizing the set of tasks based on a set of characteristics of the data flow, and generating a composite execution plan based on the optimized set of tasks to process the data flow in a distributed computing environment.
申请公布号 US9477511(B2) 申请公布日期 2016.10.25
申请号 US201313966903 申请日期 2013.08.14
申请人 INTERNATIONAL BUSINESS MACHINES CORPORATION 发明人 Jacobson Eric A.;Li Yong;Mudambi Shyam R.;Pu Xiaoyan
分类号 G06F9/44;G06F9/46;G06F9/50 主分类号 G06F9/44
代理机构 Patterson + Sheridan, LLP 代理人 Patterson + Sheridan, LLP
主权项 1. A system, comprising: one or more computer processors; and a memory containing a program which when executed by the one or more computer processors performs an operation, the operation comprising: determining, for a data flow, a set of processing units, each processing unit of the set of processing units defining one or more data processing operations to process the data flow, wherein the determination of the set of processing units comprises: determining a partition source to associate with each processing unit,inserting one or more processing units into the set to automatically repartition the partition source associated with at least one of the processing units, andoptimizing virtual datasets associated with the dataflow;generating a set of tasks to represent the set of processing units, each task in the set of tasks comprising one or more of the data processing operations of the set of processing units;optimizing the set of tasks based on a set of characteristics of the data flow comprising one or more user defined requirements;determining an execution order for the set of tasks based on the one or more user defined requirements;generating a composite execution plan based on the optimized set of tasks to process the data flow in a distributed computing environment and the execution order.
地址 Armonk NY US