发明名称 Method and structure for fast in-place transformation of standard full and packed matrix data formats
摘要 A method and structure for an in-place transformation of matrix data. For a matrix A stored in one of a standard full format or a packed format and a transformation T having a compact representation, blocking parameters MB and NB are chosen, based on a cache size. A sub-matrix A1 of A, A1 having size M1=m*MB by N1=n*NB, is worked on, and any of a residual remainder of A is saved in a buffer B. Sub-matrix A1 is worked on by contiguously moving and contiguously transforming A1 in-place into a New Data Structure (NDS), applying the transformation T in units of MB*NB contiguous double words to the NDS format of A1, thereby replacing A1 with the contents of T(A1), and moving and transforming NDS T(A1) to standard data format T(A1) with holes for the remainder of A in buffer B. The contents of buffer B is contiguously copied into the holes of A2, thereby providing in-place transformed matrix T(A).
申请公布号 US9213680(B2) 申请公布日期 2015.12.15
申请号 US200711849272 申请日期 2007.09.01
申请人 International Business Machines Corporation 发明人 Gustavson Fred Gehrung;Gunnels John A.;Sexton James C.
分类号 G06F17/16;G06F7/78;G06F12/02 主分类号 G06F17/16
代理机构 McGinn IP Law Group, PLLC 代理人 Morris Daniel P.;McGinn IP Law Group, PLLC
主权项 1. A computerized method for an in-place transformation of matrix data, said method comprising: for a matrix A having a size M×N, as stored in a memory of a computer in one of a standard full format or a packed format in one of a column major format or a row major format, and for a transformation T having a compact representation, choosing blocking parameters MB and NB based on a cache size of the computer such that MB*NB lies between an L1 cache size and an L2 cache size, and using a processor on the computer to perform the steps of: determining a size M1×N1 for blocking matrix A into a plurality of sub-matrices, and values m and n, such that M1=m*MB and N1=n*NB; determining whether any residual exists in said matrix A data if matrix A is blocked into sub-matrices, by determining whether either r or q is greater than zero, where M=m*MB+r and N=n*NB+q, and, if so, allocating at least one buffer area in memory and moving any said residuals r and/or q, respectively, into said at least one buffer area; executing a contraction processing on data of matrix A1=A(0:M1−1,0:N1−1) to convert said data in-place to an array space A(0:M1*N1−1); converting said array space A(0:M1*N1−1) into a New Data Structure (NDS) matrix A1 wherein said data is stored in memory as contiguous data in increments of blocks of said size MB×NB; transforming, in-place, said matrix A1 in NDS format by sequentially reading into the cache, transforming, and storing each MB×NB block; executing an expansion processing to convert said transformed matrix A1 in NDS format back into said one of column major format or row major format, leaving a hole or holes to replace the residual data of said at least one buffer area; and executing an out-of-place transformation of contents of said at least one buffer area and storing the out-of-place transformed data into said hole or holes.
地址 Armonk NY US