发明名称 Reliable fault resolution in a cluster
摘要 A method and system for localizing and resolving a fault in a cluster environment. The cluster is configured with at least one multi-homed node, and at least one gateway for each network interface. Heartbeat messages are sent between peer nodes and the gateway in predefined periodic intervals. In the event of loss of a heartbeat message by any node or gateway, an ICMP echo is issued to each node and gateway in the cluster for each network interface. If neither a node loss nor a network loss is validated in response to the ICMP echo, an application level ping is issued to determine if the fault associated with the absence of the heartbeat message is a transient error condition or an application software fault.
申请公布号 US7284147(B2) 申请公布日期 2007.10.16
申请号 US20030649269 申请日期 2003.08.27
申请人 INTERNATIONAL BUSINESS MACHINES CORPORATION 发明人 RAO SUDHIR G.;JACKSON BRUCE M.;DAVIS MARK C.;SRIDHARA SRIKANTH N.
分类号 G06F11/00;H04L12/26;G06F11/07;H04L12/24;H04L12/28;H04L12/56;H04L12/66;H04L29/10 主分类号 G06F11/00
代理机构 代理人
主权项
地址