发明名称 Algorithm Selection For Collective Operations In A Parallel Computer
摘要 Algorithm selection for collective operations in a parallel computer that includes a plurality of compute nodes may include: profiling a plurality of algorithms for each of a set of collective operations, including for each collective operation: executing the operation a plurality times with each execution varying one or more of: geometry, message size, data type, and algorithm to effect the collective operation, thereby generating performance metrics for each execution; storing the performance metrics in a performance profile; at load time of a parallel application including a plurality of parallel processes configured in a particular geometry, filtering the performance profile in dependence upon the particular geometry; during run-time of the parallel application, selecting, for at least one collective operation, an algorithm to effect the operation in dependence upon characteristics of the parallel application and the performance profile; and executing the operation using the selected algorithm.
申请公布号 US2014282429(A1) 申请公布日期 2014.09.18
申请号 US201313798619 申请日期 2013.03.13
申请人 MACHINES CORPORATION INTERNATIONAL BUSINESS 发明人 Archer Charles J.;Carey James E.;Sanders Philip J.;Smith Brian E.
分类号 G06F11/34 主分类号 G06F11/34
代理机构 代理人
主权项 1. A method of algorithm selection for collective operations in a parallel computer comprising a plurality of compute nodes, each compute node configured to execute one or more parallel processes of a parallel application, the method comprising: profiling a plurality of algorithms for each of a set of collective operations, including for each collective operation in the set: executing the collective operation a plurality times with each execution varying one or more of: geometry, message size, data type, and algorithm to effect the collective operation, thereby generating performance metrics for each execution; storing the performance metrics in a performance profile; at load time of a parallel application including a plurality of parallel processes configured in a particular geometry, filtering the performance profile in dependence upon the particular geometry; during run-time of the parallel application, selecting, for at least one collective operation, an algorithm to effect the collective operation in dependence upon one or more characteristics of the parallel application and the filtered performance profile; and executing the collective operation using the selected algorithm.
地址 US