摘要 |
Methods, apparatuses, and computer program products for optimizing collective communications within a parallel computer comprising a plurality of hardware threads for executing software threads of a parallel application are provided. Embodiments include a processor of a parallel computer determining for each software thread, an affinity of the software thread to a particular hardware thread. Each affinity indicates an assignment of a software thread to a particular hardware thread. The processor also generates one or more affinity domains based on the affinities of the software threads. Embodiments also include a processor generating, for each affinity domain, a topology of the affinity domain based on the affinities of the software threads to the hardware threads. According to embodiments of the present application, a processor also performs, based on the generated topologies of the affinity domains, a collective operation on one or more software threads. |
主权项 |
1. A method of optimizing collective communications within a parallel computer, the parallel computer comprising a plurality of hardware threads for executing software threads of a parallel application, the method comprising:
determining for each software thread, by a processor of the parallel computer, an affinity of the software thread to a particular hardware thread, each affinity indicating an assignment of a software thread to a particular hardware thread, wherein the processor further comprises one or more multi-chip modules (MCM) each MCM comprising a plurality of cores; generating, based on the affinities of the software threads, one or more affinity domains, wherein an affinity domain indicates which software threads are assigned to hardware threads of a same hardware domain, including:
generating, for each core, a core affinity domain indicating the software threads assigned to the hardware threads within the core,generating, for each MCM, a MCM affinity domain indicating the software threads assigned to the hardware threads within the MCM, andgenerating, for the processor, a processor affinity domain indicating the software threads assigned to the hardware threads within the processor; generating, for each affinity domain, a topology of the affinity domain based on the affinities of the software threads to the hardware threads, including generating, for each affinity domain, an n-ary tree representing a communication organization among the software threads associated with the affinity domain; and performing, based on the generated topologies of the affinity domains, a collective operation on one or more software threads, wherein performing the collective operation on one or more software threads based on the generated topologies of the affinity domains includes performing, for each affinity domain, in accordance with the generated n-ary tree, a reduction operation on the software threads associated with the affinity domain. |