摘要 |
<p>Methods, apparatus, and computer program products are disclosed for computer hardware fault diagnosis carried out in a parallel computer, where the parallel computer includes a plurality of compute nodes. The compute nodes are coupled for data communications by at least two independent data communications networks, where each data communications network includes data communications links among the compute nodes. Typical embodiments carry out hardware fault diagnosis by executing a collective operation through a first data communications network upon a plurality of the compute nodes of the computer, executing the same collective operation through a second data communications network upon the same plurality of the compute nodes of the computer, and comparing results of the collective operations.</p> |
申请人 |
INTERNATIONAL BUSINESS MACHINES CORPORATION;IBM UNITED KINGDOM LIMITED;ARCHER, CHARLES;MEGERIAN, MARK;RATTERMAN, JOSEPH;SMITH, BRIAN |
发明人 |
ARCHER, CHARLES;MEGERIAN, MARK;RATTERMAN, JOSEPH;SMITH, BRIAN |