摘要 |
Provided are a large-scale cluster monitoring system and a method for automatically building/restoring the same, which can automatically build a large-scale monitoring system and can automatically build a monitoring environment when a failure occurs in nodes. The large-scale cluster monitoring system includes a CM server, a BD server, GM nodes, NA nodes, and a DB agent. The CM server manages nodes in a large-scale cluster system. The DB server stores monitoring information that is state information of nodes in groups. The GM nodes respectively collect the monitoring information that is the state information of the nodes in the corresponding groups to store the collected monitoring information in the DB server. The NA nodes access the CM server to obtain GM node information and respectively collect the state information of the nodes in the corresponding groups to transfer the collected state information to the corresponding GM nodes. The DB agent monitors the monitoring information of the nodes in the groups, which is stored in the DB server, to detect a possible node failure. |