发明名称 Reducing bandwidth requirements for matrix multiplication
摘要 A block matrix multiplication mechanism is provided for reversing the visitation order of blocks at corner turns when performing a block matrix multiplication operation in a data processing system. The mechanism increases block size and divides each block into sub-blocks. By reversing the visitation order, the mechanism eliminates a sub-block load at the corner turns. The mechanism performs sub-block matrix multiplication for each sub-block in a given block, and then repeats operation for a next block until all blocks are computed. The mechanism may determine block size and sub-block size to optimize load balancing and memory bandwidth. Therefore, the mechanism reduces maximum throughput and increases performance. In addition, the mechanism also reduces the number of multi-buffered local store buffers.
申请公布号 US8250130(B2) 申请公布日期 2012.08.21
申请号 US20080129789 申请日期 2008.05.30
申请人 BROKENSHIRE DANIEL A.;GUNNELS JOHN A.;KISTLER MICHAEL D.;INTERNATIONAL BUSINESS MACHINES CORPORATION 发明人 BROKENSHIRE DANIEL A.;GUNNELS JOHN A.;KISTLER MICHAEL D.
分类号 G06F17/16 主分类号 G06F17/16
代理机构 代理人
主权项
地址