发明名称 REMOTE CHECKPOINT MEMORY SYSTEM AND PROTOCOL FOR FAULT-TOLERANT COMPUTER SYSTEM
摘要 A mechanism for maintaining a consistent, periodically updated state in main memory without constraining normal computer operation is provided, thereby enabling a computer system to recover from faults without loss of data or processing continuity. In this invention, a first computer includes a processor and input/output elements connected to a main memory subsystem including a primary element. A second computer has a remote checkpoint memory element, which may include one or more buffer memories and a shadow memory, which is connected to the main memory subsystem of the first computer. During normal processing, an image of data written to the primary memory element is captured by the remote checkpoint memory element. When a new checkpoint is desired (thereby establishing a consistent state in main memory to which all executing applications can safely return following a fault), the data previously captured is used to establish a new checkpointed state in the second computer. In case of failure of the first computer, the second computer can be restarted to operate from the last checkpoint established for the first computer. This structure and protocol can guarantee a consistent state in main memory, thus enabling fault-tolerant operation.
申请公布号 EP0864126(A2) 申请公布日期 1998.09.16
申请号 EP19960943524 申请日期 1996.11.27
申请人 TEXAS MICRO INC. 发明人 STIFFLER, JACK, J.
分类号 G06F11/14;G06F11/20;(IPC1-7):G06F11/20 主分类号 G06F11/14
代理机构 代理人
主权项
地址