主权项 |
1. A computer implemented method of handling alerts in a data center that includes multiple components in which a fault in one of the components can result in a cascade of faults in other components, the method comprising:
receiving, at one or more processing devices, a first alert that indicates a first fault related to a first component of the multiple components; receiving, at the one or more processing devices, a second alert that indicates a second fault related to a second component of the multiple components, wherein the first component effects the second component such that the first fault caused the second fault; determining, using the one or more processing devices, a correlation between the first alert and the second alert using a set of rules that is based on a directed graph that reflects dependencies associated with the multiple components, including a dependency of the second component on the first component; based on the determined correlation, determining that the first fault is a root cause of the first alert and the second alert; providing an indication that the first fault is the root cause of the first alert and second alert; and predicting, based on the directed graph, triggering of at least a third alert that indicates a third fault in one of the multiple components wherein the third fault occurs due to the second fault. |