发明名称 Distributed database management system with node failure detection
摘要 A node failure detector for use in a distributed database that is accessed through a plurality of interconnected transactional and archival nodes. Each node is selected as an informer node that tests communications with each other node. Each informer node generates a list of suspicious nodes that is resident in one node designated as a leader node. The leader node analyzes the data from all of the informer nodes to designate each node that should be designated for removal with appropriate failover procedures.
申请公布号 US9501363(B1) 申请公布日期 2016.11.22
申请号 US201414215372 申请日期 2014.03.17
申请人 NuoDB, Inc. 发明人 Ottavio Daniel P.
分类号 G06F11/00;G06F11/14;G06F11/30;G06F11/07 主分类号 G06F11/00
代理机构 Cooley LLP 代理人 Cooley LLP
主权项 1. In a distributed database management processing system including a plurality of nodes and means at each node for establishing a communications path with every other node in the system and each node includes means for transmitting a ping message and for transmitting a ping acknowledgement message in response to a received ping message, a method for detecting failures of individual nodes in the system comprising the steps of: A) designating one node as a leader node for analyzing information, B) at each node as an I-node node, transmitting a ping message to each other node as a receiving node, monitoring at the I-node the corresponding communications path for a valid response from each receiving node, and responding to an invalid response by designating the corresponding receiving node as a suspicious node, C) generating a message for transmittal to the leader node with an identification of the I-node and the suspicious node, D) responding to the message in the leader node by identifying other instances for which communications problems have been recorded with the identified suspicious node, determining a number of I-nodes included in the other instances, and where fewer than a majority of the I-nodes are included in other instances, sending an acknowledgement message to all the I-nodes, and E) selectively designating suspicious nodes as failed where the majority of the I-nodes identify the suspicious nodes in a generated message or the majority of the I-nodes identify the suspicious nodes in a response to the acknowledgment message.
地址 Cambridge MA US