发明名称 Compiler optimization for many integrated core processors
摘要 Systems and methods for source-to-source transformation for compiler optimization for many integrated core (MIC) coprocessors, including identifying data dependencies in candidate loops and data elements used in each iteration for arrays, profiling candidate loops to find a proper number m, wherein data transfer and computation for m iterations take an equal amount of time, and creating an outer loop outside the candidate loop, with each iteration of the outer loop executing m iterations of the candidate loop. Data streaming is performed by determining optimum buffer size for one or more arrays and inserting code before the outer loop to create optimum sized buffers, overlapping data transfer between central processing units (CPUs) and MICs with the computation; reusing buffers to reduce memory employed on the MICs, and reusing threads on MICs to repeatedly launch kernels on the MICs for asynchronous data transfer.
申请公布号 US9471289(B2) 申请公布日期 2016.10.18
申请号 US201514667819 申请日期 2015.03.25
申请人 NEC Corporation 发明人 Feng Min;Chakradhar Srimat;Song Linhai
分类号 G06F9/45 主分类号 G06F9/45
代理机构 代理人 Kolodka Joseph
主权项 1. A method for source-to-source transformation for compiler optimization for one or more many integrated core (MIC) coprocessors, comprising: identifying data dependencies in one or more candidate loops and data elements used in each iteration for one or more arrays; profiling the one or more candidate loops to find a proper number m, wherein data transfer and computation for m iterations take an equal amount of time; creating an outer loop outside the candidate loop, wherein each iteration of the outer loop executes m iterations of the candidate loop; and performing data streaming, wherein the data streaming comprises: determining optimum buffer size for one or more arrays, and inserting code before the outer loop to create one or more optimum sized buffers;overlapping data transfer between one or more central processing units (CPUs) and the MICs with the computation to hide data transfer overload;reusing the buffers to reduce memory employed on the MICs during the data transfer; andreusing threads on the MICs to repeatedly launch kernels on the MICs for asynchronous data transfer.
地址 JP