发明名称 Parallel application checkpoint image compression
摘要 Parallel application checkpoint image compression may be carried out in a parallel computer. The parallel computer may include a plurality of compute nodes, where each node is configured to execute one or more parallel tasks of the parallel application. The parallel tasks may be organized into an operational group for collective communications. In such a parallel computer, checkpoint image compression may include: generating, by each task of the parallel application, an image for checkpointing the parallel application; selecting, by an image management task, one of the images as a base template image; constructing, by the image management task, a binary radix tree, including storing differences between each task's image and the base template image in the binary radix tree; and storing, by the image management task as a checkpoint for the parallel application, the binary radix tree and the base template image, without storing every task's image.
申请公布号 US9110930(B2) 申请公布日期 2015.08.18
申请号 US201313973376 申请日期 2013.08.22
申请人 International Business Machines Corporation 发明人 Archer Charles J.;Lynam Benjamin E.
分类号 G06F17/30;G06K9/62;G06F11/34 主分类号 G06F17/30
代理机构 Kennedy Lenart Spraggins LLP 代理人 Lenart Edward J.;Kennedy Lenart Spraggins LLP
主权项 1. A method of parallel application checkpoint image compression in a parallel computer, the parallel computer comprising a plurality of compute nodes, each compute node configured to execute one or more parallel tasks of the parallel application, the parallel tasks organized into an operational group, the method comprising: generating, by each task of the parallel application, an image for checkpointing the parallel application; selecting, by an image management task, one of the images as a base template image; constructing, by the image management task, a binary radix tree, including storing differences between each task's image and the base template image in the binary radix tree; and storing, by the image management task as a checkpoint for the parallel application, the binary radix tree and the base template image, without storing every task's image.
地址 Armonk NY US