发明名称 Column based data transfer in extract, transform and load (ETL) systems
摘要 Executing a plurality of transform stages in an extract, transform and load (ETL) job including, for each of the transform stages, receiving a plurality of input row identifiers (RIDs) corresponding to a first plurality of source database table rows in a source database table. Data is retrieved directly from a subset of the source database table columns in the first plurality of source database table rows based on the input RIDs and transform logic. Partial row data including data from the subset of the source database table columns is generated for each of the first plurality of source database table rows. Transformed data is generated based on the partial row data and to the transform logic. Output RIDs corresponding to a second plurality of rows in the source database table that include a least a subset of the transformed data are output to a downstream stage.
申请公布号 US9063992(B2) 申请公布日期 2015.06.23
申请号 US201313936508 申请日期 2013.07.08
申请人 International Business Machines Corporation 发明人 Bhide Manish A.;Bonagiri Krishna K.;Mittapalli Srinivas K.
分类号 G06F17/30 主分类号 G06F17/30
代理机构 Cantor Colburn LLP 代理人 Cantor Colburn LLP
主权项 1. A method comprising: executing a plurality of transform stages in an extract, transform and load (ETL) job, the ETL job including an extract stage and a load stage in addition to the plurality of transform stages, and the ETL job configured to access a source database table that includes data organized into source database table rows and source database table columns, the executing comprising for each transform stage: receiving, from an upstream stage, a plurality of input row identifiers (RIDs) corresponding to a first plurality of source database table rows in the source database table;retrieving data directly from a subset of the source database table columns in the first plurality of source database table rows in the source database table, the retrieving responsive to the input RIDs and to transform logic associated with the transform stage, wherein the subset of the source database table columns consists of one or more of the source database table columns that are required by one of the plurality of transform stages;generating partial row data for each of the first plurality of source database table rows, the partial row data comprising data from the subset of the source database table columns;generating transformed data responsive to the partial row data and to the transform logic; andoutputting, to a downstream stage, a plurality of output RIDs corresponding to a second plurality of source database table rows that include at least a subset of the transformed data.
地址 Armonk NY US