发明名称 Minimizing false negative and duplicate health monitoring alerts in a dual master shared nothing database appliance
摘要 A primary master node and a standby master node monitor the health of a shared nothing database appliance to afford high availability while minimizing false negatives and duplicate alerts by executing continuously in parallel complimentary processes that determine whether the database is running, and which master node is the active database master node. The active database master node monitors the health of the components of the database appliance by polling each component to detect failures and warnings, and the other master node monitors the status of the active master node. Upon detecting a failure of the active master node, the other node takes over health monitoring. If the database is not running, the designated primary master node performs health monitoring.
申请公布号 US9164864(B1) 申请公布日期 2015.10.20
申请号 US201113338543 申请日期 2011.12.28
申请人 EMC Corporation 发明人 Novick Ivan D.;Heath Timothy;Kala Sharad
分类号 G06F17/30;G06F11/30 主分类号 G06F17/30
代理机构 代理人 Young Barry N.
主权项 1. A method of monitoring the health of a database appliance comprising a database distributed on a plurality of database nodes, and having redundant master nodes including a primary master node and a standby master node, the database being active on and controlled by only one of said redundant master nodes at a time, the method comprising: executing concurrently in parallel and independently on both said redundant master nodes a database monitoring process, said database monitoring process comprising a resolution process and a health monitor process, the resolution process resolving on which one of said redundant master nodes said database is active at said time, said one node being designated the primary master node, and confirming that said primary master node is executing said health monitor process to monitor hardware and software components of said database and report alerts, the other redundant master node being the standby master node and not issuing alerts; resolving by executing said resolution process in parallel by said primary master node and said standby master node whether the database is running on said primary master node, including: attempting, by the primary master node, a first login to the database on the primary master node; andattempting concurrently, by the standby master node, a second login to the database on the primary master node; upon said first and second logins being successful, resolving by the resolution process on the primary master node that the database is running on the primary master node and that the primary master node is executing said health monitor process of hardware and software components of the database; monitoring by the standby master node the status of the primary master node to detect a failure of the primary master node, including: attempting, by the standby master node, a third login to the database on the primary master node after a first predetermined period of time;upon identifying that the third login attempt is unsuccessful, determining that the primary master node has failed based on the unsuccessful third login attempt; and upon determining said failure of the primary master node by the standby master node: attempting, by the standby master node, a fourth login to the database on the standby master node; upon the fourth login attempt by the standby master node being successful, determining that the database is active on said standby master node; and executing said health monitor process of said components of said database by the standby master node in response to determining that the fourth login attempt was successful.
地址 Hopkinton MA US