发明名称 Modeling data exchange in a data flow of an extract, transform, and load (ETL) process
摘要 Methods, systems, and computer program products for generating code from a data flow associated with an extract, transform, and load (ETL) process. In one implementation, the method includes identifying a data exchange requirement between a first operator and a second operator in the data flow. The first operator is a graphical object that represents a first data transformation step in the data flow and is associated with a first type of runtime engine, and the second operator is a graphical object that represents a second data transformation step in the data flow and is associated with a second type of runtime engine. The method further includes generating code to manage data staging between the first operator and the second operator in the data flow. The code exchanges data from a format associated with the first type of runtime engine to a format associated with the second type of runtime engine.
申请公布号 US8903762(B2) 申请公布日期 2014.12.02
申请号 US201213523217 申请日期 2012.06.14
申请人 International Business Machines Corporation 发明人 Jin Qi;Liao Hui;Srinivasan Sriram;Xu Lin
分类号 G06F17/30;G06F9/44 主分类号 G06F17/30
代理机构 Patterson & Sheridan, LLP 代理人 Patterson & Sheridan, LLP
主权项 1. A computer-implemented method to generate code from a data flow associated with an extract, transform, and load (ETL) process, the computer implemented method comprising: converting the data flow into a logical operator graph representing the data flow and including a plurality of operators corresponding to a sequence of operations defined by the data flow for the associated ETL process, the plurality of operators including first and second operators representing data transformation steps in the data flow, wherein the first and second operators are associated with first and second types of runtime engine, respectively, wherein the logical operator graph is in turn converted into a query graph model; analyzing at least one of the logical operator graph and the query graph model in order to identify a data exchange requirement between the first and second operators from the data flow; modifying at least one of the data flow, the logical operator graph, and the query graph model based on the identified data exchange requirement and including inserting a data station operator between the first and second operators from the data flow, the data station operator representing a staging point in the data flow operable to exchange data from a format associated with the first type of runtime engine to a format associated with the second type of runtime engine; and subsequent to inserting the data station operator between the first and second operators from the data flow, generating an execution plan graph from the query graph model and by operation of one or more computer processors, including generating, based on the inserted data station operator, code to manage data staging between the first and second operators from the data flow.
地址 Armonk NY US