发明名称 Using accelerators in a hybrid architecture for system checkpointing
摘要 A hybrid node of a High Performance Computing (HPC) cluster uses accelerator nodes for checkpointing to increase overall efficiency of the multi-node computing system. The host node or processor node reads/writes checkpoint data to the accelerators. After offloading the checkpoint data to the accelerators, the host processor can continue processing while the accelerators communicate the checkpoint data with the host or wait for the next checkpoint. The accelerators may also perform dynamic compression and decompression of the checkpoint data to reduce the checkpoint size and reduce network loading. The accelerators may also communicate with other node accelerators to compare checkpoint data to reduce the amount of checkpoint data stored to the host.
申请公布号 US9104617(B2) 申请公布日期 2015.08.11
申请号 US200812270144 申请日期 2008.11.13
申请人 International Business Machines Corporation 发明人 Darrington David L;Markland Matthew W;Sanders Philip James;Shok Richard Michael
分类号 G06F15/16;G06F11/14;H04L29/06;G06F11/20 主分类号 G06F15/16
代理机构 Martin & Associates, LLC 代理人 Martin & Associates, LLC ;Petersen Bret J.
主权项 1. A multi-node computer system comprising: a plurality of hybrid compute nodes that each comprise a processor and an associated accelerator local to the compute node; checkpoint management software residing in the accelerator; a network connecting the plurality of compute nodes and a storage mechanism; wherein a checkpoint of the processor is made and written through a local operation in the node to the accelerator, and wherein the checkpoint management software in the accelerator then processes the checkpoint by communicating checkpoint data over the network to the storage mechanism while the processor continues execution.
地址 Armonk NY US