发明名称 Inter-node communication scheme for sharing node operating status
摘要 A gossiping scheme for sharing node status in a cluster of nodes provides a robust mechanism for determining node status within the cluster. Nodes transmit gossip messages to each other nodes, the gossip messages listing other nodes in the cluster that are operational. When a node does not receive a gossip message from a particular node within a predetermined time period, then the node transmits messages to the other nodes indicating that the particular node is down. However, if another node has received a packet from the particular node within the predetermined time period and receives the node down message, then the other node responds with a node alive message.
申请公布号 US9553789(B2) 申请公布日期 2017.01.24
申请号 US201414314146 申请日期 2014.06.25
申请人 INTERNATIONAL BUSINESS MACHINES CORPORATION 发明人 Ganapathy Arunachalam;Mishra Rajeev;Russell Lance W.;Vaddagiri Murali
分类号 G06F15/16;H04L12/26 主分类号 G06F15/16
代理机构 Mitch Harris, Atty at Law, LLC 代理人 Mitch Harris, Atty at Law, LLC ;Harris Andrew M.;Petrokaitis Joseph J.
主权项 1. A method for determining node operating status among a cluster of nodes of a computer system, the method comprising: first transmitting gossip messages directly between node pairs in the cluster of nodes, wherein the gossip messages contain an indication of operating status of other nodes in the cluster of nodes, wherein the other nodes are nodes other than the nodes in the node pairs; receiving the gossip messages at individual nodes of the node pairs; responsive to the receiving the gossip messages at the individual nodes, at the other nodes, locally updating a local database of operating status according to the received gossip messages, wherein the updating sets a value of a local operating status kept by the individual nodes for a particular one of the other nodes to a non-operational status if the receiving by the individual nodes has not received a gossip message from the particular one of the other nodes during a predetermined time period; responsive to the locally updating setting the local operating status of the particular one of the other nodes to a non-operational status, second transmitting a node down message separate from the gossip messages that indicates the non-operational status of the particular node to the other nodes in the cluster; at a first node other than the particular node, receiving the node down message; responsive to receiving the node down message, determining whether or not the first node has received a gossip message from the particular node during the predetermined time period; responsive to determining that the first node has received the gossip message from the particular node during the predetermined time period, transmitting a node alive message from the first node indicating that the status of the particular node is operational and setting the local operating status of the particular node at the first node to an operational status; and repeating the first transmitting, receiving, updating and second transmitting at each of the nodes in the node pairs, so that the local status kept by each of the nodes reflects the status of each of the other nodes in the cluster, wherein the first transmitting selectively transmits gossip messages containing an indication of operating status of other nodes depending on whether the operating status of the other nodes in the local database is set to non-operational, whereby the first transmitting halts gossip messaging for nodes marked non-operational.
地址 Armonk NY US