摘要 |
Methods, apparatus, and products are disclosed for performing an allreduce operation on a plurality of compute nodes of a parallel computer, each node including at least two processing cores, that include: establishing, for each node, a plurality of logical rings, each ring including a different set of at least one core on that node, each ring including the cores on at least two of the nodes; iteratively for each node: assigning each core of that node to one of the rings established for that node to which the core has not previously been assigned, and performing, for each ring for that node, a global allreduce operation using contribution data for the cores assigned to that ring or any global allreduce results from previous global allreduce operations, yielding current global allreduce results for each core; and performing, for each node, a local allreduce operation using the global allreduce results.
|