发明名称 Multi-hop root cause analysis
摘要 Methods for monitoring a networked computing environment and for identifying root causes of performance and availability issues that occur throughout multiple layers of the networked computing environment are described. In some embodiments, a software service provided by a networked computing environment may experience a service-related performance or availability issue. In response to detecting the service-related issue affecting the service, a root cause identification tool may aggregate data from a plurality of information technology management software tools monitoring the networked computing environment, identify causal relationships between a plurality of failures associated with the service-related issue based on the aggregated data, determine a chain of failures of the plurality of failures based on the causal relationships, identify a root cause of the service-related issue based on the chain of failures, and transmit an alarm corresponding with the root cause.
申请公布号 US9497071(B2) 申请公布日期 2016.11.15
申请号 US201414242865 申请日期 2014.04.01
申请人 CA, INC. 发明人 Gates Carrie E.;Greenspan Steven L.;Velez-Rojas Maria C.;Mankovskii Serguei
分类号 G06F11/00;H04L12/24;G06F11/07;H04L12/26;G06F11/30 主分类号 G06F11/00
代理机构 Vierra Magen Marcus LLP 代理人 Vierra Magen Marcus LLP
主权项 1. A method for monitoring a networked computing environment, comprising: detecting an alarm corresponding with a performance issue in the networked computing environment, the alarm is associated with a time of failure; identifying a first application associated with the performance issue; acquiring an infrastructure mapping for the first application, the infrastructure mapping maps the first application to components of the networked computing environment that supported the first application at the time of failure; aggregating a plurality of alarms from a plurality of monitoring applications monitoring the networked computing environment; generating a failure graph based on the infrastructure mapping and the plurality of alarms; determining a chain of failures based on the failure graph, the determining a chain of failures comprises determining the chain of failures based on an estimated time to fix a failure associated with a first leaf node of the chain of failures; identifying a root cause of the performance issue based on the chain of failures; and outputting the root cause of the performance issue.
地址 New York NY US