发明名称 Suicide among well-mannered cluster nodes experiencing heartbeat failure
摘要 Methods for re-configuring a cluster computer system of multiple or more nodes when the cluster experiences communications failure. First and second nodes of the cluster have respective channel controllers. A SCSI channel and the controllers communicatively connect the multiple nodes. When a node becomes aware of a possible communications failure, the node attempts to determine the authenticity the failure and responds according to the determined authenticity.According to one method, a first node detects heartbeat node-to-node communications failure on the channel and then tests a physical drive on the channel. If the testing is successful, the node kills the other node. If the testing is unsuccessful, the first node commits suicide.In one embodiment, the coupling includes multiple channels communicatively coupling the first and second nodes and the first node selecting one of the channels for node-to-node communications. In this environment, choosing a physical drive involves testing node-to-node communications on another of the channels if no physical drive is online on the channel (and terminating the re-configuring method). If a drive is available, the first node uses the first physical drive online on the channel for testing.In another method, the second node initially detects communications failure and communicates that by attempting to negotiate wih the first node for a new configuration of the computer system. The first node tests a physical drive in response and negotiates with the second node if the testing was successful. If the testing was unsuccessful, the first node commits suicide.
申请公布号 US6460149(B1) 申请公布日期 2002.10.01
申请号 US20000547000 申请日期 2000.04.11
申请人 INTERNATIONAL BUSINESS MACHINES CORPORATION 发明人 ROWLANDS MOHAN BABU;GNANASIVAM GOVINDARAJU
分类号 G06F11/00;(IPC1-7):G06F11/00 主分类号 G06F11/00
代理机构 代理人
主权项
地址