发明名称 System and method for dynamic problem determination using aggregate anomaly analysis
摘要 A system and method are provided for determining problem conditions in an IT infrastructure using aggregate anomaly analysis. The anomalies in the metrics occurring in the monitored IT infrastructure are aggregated from all resources reporting metrics as a function of time. The aggregated metric anomalies are then normalized to account for the state of the monitored IT infrastructure to provide a normalized aggregate anomaly count. A threshold noise level is then determined utilizing a variably selectable desired level of confidence such that a problem event is only determined to likely be occurring in the IT infrastructure when the normalized aggregate anomaly count exceeds the threshold noise level. The normalized aggregate anomaly count is monitored against the threshold noise level as a function of time, such that a problem event in the IT infrastructure is identified when the normalized aggregate anomaly count exceeds the threshold noise level at a given time.
申请公布号 US9058259(B2) 申请公布日期 2015.06.16
申请号 US200812242153 申请日期 2008.09.30
申请人 VMware, Inc. 发明人 Marvasti Mazda A.
分类号 G06F11/07;H04L12/24;H04L12/26 主分类号 G06F11/07
代理机构 代理人
主权项 1. A method comprising: determining an aggregated count of metric anomalies occurring in an information technology (IT) infrastructure, including obtaining a total count of all metric anomalies as a function of time for a set of resources supplying metric data that are being monitored on the IT infrastructure and adjusting the total count of all metric anomalies to produce the aggregated count of metric anomalies to account for the number of resources in the set of resources that are supplying metric data at a given time, the number of resources that are supplying the metric data being less than the number of resources in the set of resources; determining a threshold noise level for the aggregated count of metric anomalies above which a problem event is likely to be occurring in the IT infrastructure; identifying a problem event in the IT infrastructure when the aggregated count of metric anomalies exceeds the threshold noise level at a given time; issuing an alert when the problem event in the IT infrastructure is identified; and initiating a corrective action in response to the issued alert; wherein at least one of the determining the aggregated count of metric anomalies, the determining the threshold noise level and the identifying the problem event is executed by a processor.
地址 Palo Alto CA US