发明名称 Identifying data communications algorithms of all other tasks in a single collective operation in a distributed processing system
摘要 Topology mapping in a distributed processing system, the distributed processing system including a plurality of compute nodes, each compute node having a plurality of tasks, each task assigned a unique rank, including: assigning each task to a geometry defining the resources available to the task; selecting, from a list of possible data communications algorithms, one or more algorithms configured for the assigned geometry; and identifying, by each task to all other tasks, the selected data communications algorithms of each task in a single collective operation.
申请公布号 US9229780(B2) 申请公布日期 2016.01.05
申请号 US201213667302 申请日期 2012.11.02
申请人 International Business Machines Corporation 发明人 Archer Charles J.;Carey James E.;Markland Matthew W.;Sanders Philip J.
分类号 G06F9/46;G06F9/50;G06F9/54 主分类号 G06F9/46
代理机构 Kennedy Lenart Spraggins LLP 代理人 Lenart Edward J.;Johnson Grant A.;Kennedy Lenart Spraggins LLP
主权项 1. A method of topology mapping in a distributed processing system, the distributed processing system including a plurality of compute nodes, each compute node executing a plurality of tasks, each task assigned a unique rank, the method comprising: assigning each task to a geometry, wherein the geometry comprises a collection of compute nodes having distinct data communications abilities and resources available to tasks assigned to the geometry, wherein data communications abilities include one or more data communications algorithms supported by each compute nodes; selecting, by each task from a list of possible data communications algorithms, one or more of the possible data communications algorithms supported by the compute node upon which the task is executing in the assigned geometry; identifying, by each task, the selected data communications algorithms of all other tasks in a single collective operation, wherein the identification includes the steps of: setting, by each task a string of bits where each bit represents a data communications algorithm available for the task being executed on the compute node, a bit to true for each algorithm selected by the task; andperforming, by all the tasks, an allreduce operation with a bitwise AND of all string of bits associated with each task; and performing, by each task, communication with all other tasks using one or more data communication algorithms common to all tasks.
地址 Armonk NY US