主权项 |
1. A system for failure event detection and grouping using adaptive polling intervals and sliding window buffering, said system comprising:
one or more memory areas associated with or accessible by a computing device-storing a plurality of virtual machines (VMs) and in communication with one or more associated datastores, the memory areas including a value for a short timer, and a value for a long timer; and a processor programmed to:
upon detection of a failure event affecting at least one of the plurality of VMs or associated datastores, initiate the short timer and the long timer and poll for additional failure events during each of a series of polling intervals, wherein the series of polling intervals continue until either the short timer or the long timer expires, wherein a duration of each subsequent polling interval of the series depends on whether an additional failure was detected during a respective preceding polling interval of the series, the polling during each polling interval of the series of polling intervals comprising:
upon detection of at least one of the additional failure events during a particular polling interval, collecting data relating to the detected at least one additional failure event, resetting the short timer, and reducing a duration of a next polling interval relative to the particular polling interval; andupon no detection of at least one of the additional failure events during a particular polling interval, increasing a duration of a next polling interval, relative to the particular polling interval;group the detected failure event with the detected at least one additional failure event into a group of failure events; andperform recovery operations in parallel for each failure event in the group of failure events. |