发明名称 Increasing parallel program performance for irregular memory access problems with virtual data partitioning and hierarchical collectives
摘要 A method for increasing performance of an operation on a distributed memory machine is provided. Asynchronous parallel steps in the operation are transformed into synchronous parallel steps. The synchronous parallel steps of the operation are rearranged to generate an altered operation that schedules memory accesses for increasing locality of reference. The altered operation that schedules memory accesses for increasing locality of reference is mapped onto the distributed memory machine. Then, the altered operation is executed on the distributed memory machine to simulate local memory accesses with virtual threads to check cache performance within each node of the distributed memory machine.
申请公布号 US8869155(B2) 申请公布日期 2014.10.21
申请号 US201012945488 申请日期 2010.11.12
申请人 International Business Machines Corporation 发明人 Almasi George;Cong Guojing;Klepacki David J.;Saraswat Vijay A.
分类号 G06F9/46;G06F9/52 主分类号 G06F9/46
代理机构 Yee & Associates, P.C. 代理人 Yee & Associates, P.C. ;Dougherty Anne
主权项 1. A computer implemented method for increasing performance of an operation on a distributed memory machine, the computer implemented method comprising: transforming asynchronous parallel steps in the operation into synchronous parallel steps by analyzing a sequence of steps executed by each processor in a plurality of processors, dividing each processor step in the sequence of steps into a chunk of instructions so that each shared memory access is a particular instruction chunk, aligning processor instruction chunks from each thread in a plurality of threads, and introducing artificial synchronization among the processor instruction chunks by inserting dummy processor instructions where required to facilitate alignment of the processor instruction chunks; rearranging the synchronous parallel steps of the operation to generate an altered operation that schedules memory accesses for increasing locality of reference by partitioning a target memory access array into a plurality of blocks, and assigning each block in the target memory access array to a different virtual thread in a plurality of threads; mapping the altered operation that schedules memory accesses for increasing locality of reference onto the distributed memory machine; and executing the altered operation on the distributed memory machine to simulate local memory accesses with virtual threads to check cache performance within each node of the distributed memory machine.
地址 Armonk NY US