发明名称 Monitoring and analysis of operating states in a computing environment
摘要 A set of techniques is described for monitoring and analyzing crashes and other malfunctions in a multi-tenant computing environment (e.g. cloud computing environment). The computing environment may host many applications that are executed on different computing resource combinations. The combinations may include varying types and versions of hardware or software resources. A monitoring service is deployed to gather statistical data about the failures occurring in the computing environment. The statistical data is then analyzed to identify abnormally high failure patterns. The failure patterns may be associated with particular computing resource combinations being used to execute particular types of applications. Based on these failure patterns, suggestions can be issued to a user to execute the application using a different computing resource combination. Alternatively, the failure patterns may be used to modify or update the various resources in order to correct the potential malfunctions caused by the resource.
申请公布号 US9037922(B1) 申请公布日期 2015.05.19
申请号 US201213461068 申请日期 2012.05.01
申请人 Amazon Technololgies, Inc. 发明人 Cabrera Luis Felipe;Brandwine Eric Jason;Hamilton James R.;Jenkins Jonathan A.;Klein Matthew D.;Thomas Nathan;Vincent Pradeep
分类号 G06F11/00;G06F11/30 主分类号 G06F11/00
代理机构 Hogan Lovells US LLP 代理人 Hogan Lovells US LLP
主权项 1. A computer implemented method for failure monitoring, said method comprising: under the control of one or more computer systems configured with executable instructions, monitoring a performance of a plurality of applications in a multi-tenant environment over a period of time, the applications being provided using a plurality of different combinations of resources; detecting at least one abnormal execution condition for at least one of the applications in the multi-tenant computing environment during the period of time; recording information for the at least one abnormal execution condition; analyzing the recorded information to generate statistical data about the at least one abnormal execution condition throughout the multi-tenant computing environment; identifying at least one statistically significant correlation between the at least one abnormal execution condition and the combination of resources hosting the at least one application based at least in part on the statistical data, wherein identifying the at least one statistically significant correlation includes determining that the at least one of the applications hosted on a particular combination of resources has failed more frequently than the at least one of the applications hosted on at least one other combination of resources by more than a predetermined threshold; and determining a modification for the combination of resources hosting the at least one of the applications, the modification being based at least in part on the at least one statistically significant correlation.
地址 Reno NV US