发明名称 MONITORING DISTRIBUTED SOFTWARE HEALTH AND MEMBERSHIP IN A COMPUTE CLUSTER
摘要 Techniques for monitoring distributed software health and membership of nodes and software components operating in a compute cluster are disclosed. In one embodiment, each node in the compute cluster operates a watchdog monitoring component in addition to software operating components. The watchdogs are provided with a list of all nodes in a compute cluster that identifies every node's neighboring nodes. Each watchdog checks the health of one of its neighboring node, ensuring that this neighboring node is healthy and is operating successfully. Additionally, each watchdog verifies the cluster membership of its other neighboring nodes to ensure that the cluster is operating an adequate number of operating nodes, and that an adequate number of watchdogs are present in the cluster. If an unhealthy or non-member node is identified, the watchdog may initiate corrective action and attempt to restore the node to a correct operational state.
申请公布号 US2011283149(A1) 申请公布日期 2011.11.17
申请号 US20100778177 申请日期 2010.05.12
申请人 RICHMOND MICHAEL A.;INTERNATIONAL BUSINESS MACHINES CORPORATION 发明人 RICHMOND MICHAEL A.
分类号 G06F11/30;G06F15/16 主分类号 G06F11/30
代理机构 代理人
主权项
地址