发明名称 DISTRIBUTED PROCESSING OF DATA RECORDS
摘要 Embodiments relate to distributed processing of data on a distributed database computer system. An aspect includes distributing rows of an arbitrary matrix to all of a plurality of processing units, wherein a number of the rows is equal to a number of the processing units, wherein one row of the arbitrary matrix is stored in each storage memory. Another aspect includes executing a first user defined function (UDF) on each processing unit, wherein a Cartesian product of each processing parameter matrix and the row of the arbitrary matrix is calculated on each processing unit and the matrix set is stored in the processor memory of each processing unit; and executing a second UDF on each processing unit having at least one data record after the executing of the first UDF, wherein all data records stored in the storage memory of the each processing unit are processed one by one.
申请公布号 US2015120758(A1) 申请公布日期 2015.04.30
申请号 US201414514795 申请日期 2014.10.15
申请人 International Business Machines Corporation 发明人 Cichosz Pawel;Dendek Cezary;Draminski Michal;Klopotek Miezyslaw;Skowronski Krzysztof
分类号 G06F17/30;G06F17/16 主分类号 G06F17/30
代理机构 代理人
主权项 1. A computer implemented method for distributed processing of data on a distributed database computer system, the method comprising: distributing rows of an arbitrary matrix to all of a plurality of processing units, wherein a number of the rows is equal to a number of the processing units, wherein a matrix set is used to calculate a value set corresponding to each data record by using said data record as input, the matrix set comprising at least one processing parameter matrix, the value set comprising at least one calculation value, the distributed database computer system comprising the plurality of processing units connected in a share-nothing parallel processing architecture, wherein each processing unit comprises a processor of the each processing unit, a processor memory of the each processing unit, and a storage memory of the each processing unit, wherein the arbitrary matrix is stored in the distributed data base in a way that one row of the arbitrary matrix is stored in each storage memory, wherein the data records and the matrix set are stored in a distributed database using the storage memories, each processor is being operable for executing user defined functions (UDFs), calculating the value set corresponding to only one data record at a time, executing transaction processing, storing data in the processor memory, and using the data stored in the processor memory for execution of the UDFs within a framework of one transaction; and performing transaction processing in a framework of one transaction by: executing a first UDF on each processing unit, wherein a Cartesian product of the each processing parameter matrix and the row of the arbitrary matrix is calculated on each processing unit and as a result thereof the matrix set is stored in the processor memory of each processing unit; andexecuting a second UDF on each processing unit having at least one data record after the executing of the first UDF, wherein a number of repetitive executions of the second UDF on each processing unit is equal to the number of the data records stored in the storage memory of the each processing unit and all data records stored in the storage memory of the each processing unit are processed one by one, wherein the value set corresponding to the data record is calculated using the matrix set stored in the processor memory of said respective processing unit.
地址 Armonk NY US