发明名称 Recording A Communication Pattern and Replaying Messages in a Parallel Computing System
摘要 A parallel computer system includes a plurality of compute nodes. Each of the compute nodes includes at least one processor, at least one memory, and a direct memory address engine coupled to the at least one processor and the at least one memory. The system also includes a network interconnecting the plurality of compute nodes. The network operates a global message-passing application for performing communications across the network. Local instances of the global message-passing application operate at each of the compute nodes to carry out local processing operations independent of processing operations carried out at another one of the compute nodes. The direct memory address engines are configured to interact with the local instances of the global message-passing application via injection FIFO metadata describing an injection FIFO in a corresponding one of the memories. The local instances of the global message passing application are configured to record, in the injection FIFO in the corresponding one of the memories, message descriptors associated with messages of an arbitrary communication pattern in an iteration of an executing application program. The local instances of the global message passing application are configured to replay the message descriptors during a subsequent iteration of the executing application program.
申请公布号 US2011010471(A1) 申请公布日期 2011.01.13
申请号 US20090500715 申请日期 2009.07.10
申请人 INTERNATIONAL BUSINESS MACHINES CORPORATION 发明人 HEIDELBERGER PHILIP;KUMAR SAMEER
分类号 G06F13/28 主分类号 G06F13/28
代理机构 代理人
主权项
地址