发明名称 Algorithm selection for collective operations in a parallel computer
摘要 Algorithm selection for collective operations in a parallel computer that includes a plurality of compute nodes may include: profiling a plurality of algorithms for each of a set of collective operations, including for each collective operation: executing the operation a plurality times with each execution varying one or more of: geometry, message size, data type, and algorithm to effect the collective operation, thereby generating performance metrics for each execution; storing the performance metrics in a performance profile; at load time of a parallel application including a plurality of parallel processes configured in a particular geometry, filtering the performance profile in dependence upon the particular geometry; during run-time of the parallel application, selecting, for at least one collective operation, an algorithm to effect the operation in dependence upon characteristics of the parallel application and the performance profile; and executing the operation using the selected algorithm.
申请公布号 US9208052(B2) 申请公布日期 2015.12.08
申请号 US201313798619 申请日期 2013.03.13
申请人 International Business Machines Corporation 发明人 Archer Charles J.;Carey James E.;Sanders Philip J.;Smith Brian E.
分类号 G06F9/44;G06F11/34;G06F11/07;G06F9/54 主分类号 G06F9/44
代理机构 Kennedy Lenart Spraggins LLP 代理人 Lenart Edward J.;Johnson Grant A.;Kennedy Lenart Spraggins LLP
主权项 1. A method of algorithm selection for collective operations in a parallel computer comprising a plurality of compute nodes, each compute node configured to execute one or more parallel processes of a parallel application, the method comprising: profiling a plurality of algorithms for each of a set of collective operations, including for each collective operation in the set: executing the collective operation a plurality times with each execution varying one or more of: geometry, message size, data type, and algorithm to effect the collective operation, thereby generating performance metrics for each execution; storing the performance metrics in a performance profile, wherein the performance profile comprises a plurality of entries; at load time of a parallel application including a plurality of parallel processes configured in a particular geometry, filtering the entries of the performance profile in dependence upon the particular geometry; during run-time of the parallel application, selecting, for at least one collective operation, an entry for an algorithm from the filtered entries of the performance profile to effect the collective operation in dependence upon one or more characteristics of the parallel application and the filtered performance profile; executing the collective operation using the selected algorithm; monitoring, during run time, performance metrics of the selected algorithm, wherein monitoring performance metrics of the selected algorithm comprises identifying a failure during execution of the collective operation; and updating the performance profile with the monitored performance metrics, wherein updating the performance profile with the monitored performance metrics comprises removing the selected entry for the algorithm in the performance profile.
地址 Armonk NY US