发明名称 METHOD AND APPARATUS FOR TEMPLATE BASED PARALLEL CHECKPOINTING
摘要 A method and apparatus for a template based parallel checkpoint save for a massively parallel super computer system using a parallel checksum algorithm such as rsync. In preferred embodiments, the checkpoint data for each node is compared to a template checkpoint file that resides in the storage and that was previously produced. Embodiments herein greatly decrease the amount of data that must be transmitted and stored for faster checkpointing and increased efficiency of the computer system. Embodiments are directed a parallel computer system with nodes arranged in a cluster with a high speed interconnect that can perform broadcast communication. The checkpoint contains a set of actual small data blocks with their corresponding checksums from all nodes in the system. The data blocks may be compressed using conventional non-lossy data compression algorithms to further reduce the overall checkpoint size.
申请公布号 US2008092030(A1) 申请公布日期 2008.04.17
申请号 US20070953037 申请日期 2007.12.08
申请人 INTERNATIONAL BUSINESS MACHINES CORPORATION 发明人 ARCHER CHARLES J.;INGLETT TODD A.
分类号 G06F11/14 主分类号 G06F11/14
代理机构 代理人
主权项
地址