发明名称 Method and system for achieving collective consistency in detecting failures in a distributed computing system
摘要 A method and apparatus are disclosed for achieving collective consistency in the detection and reporting of failures in a distributed computing system having multiple processors. Each processor is capable of being called by a parallel application for system status. Initially, each processor sends the other processors its view on the status of the processors. It then waits for similar views from other processors except those regarded as failed in its own view. If the received views are identical to the view of the processor, the processor returns its view to the parallel application. In a preferred embodiment, if the views are not identical to its view, the processor sets its view to the union of the received views and its current view. The steps are then repeated. Alternately, the steps are repeated if the processor does not have information that each of the processors not regarded as failed in its view forms an identical union view. In another preferred embodiment, the method is terminated if a quorum is not formed by the processors which are not regarded as failed. Alternatively, after sending its view, the processor waits for an exit condition. Depending on the exit condition, the processor sets its view to a quorum view and sends a "DECIDE" message to the other processors. In another embodiment, the processor updates its view and the method steps are repeated.
申请公布号 US5682470(A) 申请公布日期 1997.10.28
申请号 US19950522651 申请日期 1995.09.01
申请人 INTERNATIONAL BUSINESS MACHINES CORPORATION 发明人 DWORK, CYNTHIA;HO, CHING-TIEN;STRONG, JR., HOVEY RAYMOND
分类号 G06F11/00;G06F11/18;(IPC1-7):G06F11/00 主分类号 G06F11/00
代理机构 代理人
主权项
地址