发明名称 Model checking for distributed application validation
摘要 A model checking system is described herein that more effectively verifies and validates the design of distributed applications by providing a model and a generic framework to check application invariant properties, detect anomaly behaviors, and monitor application health. The model checking system checks on-line application behavior against application models derived from formal descriptions of the application. The system formulates the concrete application as an abstract model and a number of rules or properties that are expected to hold for the application under all conditions. The model checker compares the actual application execution with the models and either confirms that the properties hold true or reports that the properties are violated. Thus, the model checking system provides more efficient and thorough validation of distributed applications under more realistic production conditions.
申请公布号 US9092561(B2) 申请公布日期 2015.07.28
申请号 US201012908007 申请日期 2010.10.20
申请人 Microsoft Technology Licensing, LLC 发明人 Miao Lidan;Song Peng;Zhang Li;Tarta Mihail G.
分类号 G06F11/36;G06F11/34 主分类号 G06F11/36
代理机构 代理人 Mehta Aneesh;Drakos Kate;Minhas Micky
主权项 1. A computer-implemented method for executing a distributed application and verifying correctness of distributed system behavior, the method comprising: receiving a plurality models that collectively describe expected behavior of the distributed application, wherein the plurality of models are generated by monitoring the distributed application in a learning mode and wherein one or more of the plurality of models are further updated in the learning mode upon determination that an error reported was an erroneous error, wherein receiving the plurality models comprises receiving one or more invariant properties of the distributed application, wherein the plurality of models comprises at least one of a finite state machines (FSM) model representing an individual machine in the distributed application, wherein each FSM includes statistics for verification for each state in the FSM, an error model describing at least one stable property of the distributed application, the error model further comprising an error type identifier, an error severity indicator and a key item for extracting correlation keys for event correlation and a performance metric model, the performance metric model comprises a hard bound threshold beyond which performance statistics cannot go or a soft bound threshold beyond which performance statistics cannot stay over a set period of time, the plurality of models further comprising a dependency graph that correlates events among the plurality of models; starting a model checking service online or offline that monitors execution of the distributed application to detect behavior of the application without inserting annotations into the distributed application; collecting one or more traces that describe behavior of the application as a series of one or more events that occur at distributed components of the application; identifying one or more events based on the collected traces; checking the received plurality of models against the identified events to determine whether any unexpected application behavior has occurred wherein the unexpected behavior is selected from a group consisting of state transitions that violate one of the FSM models, an error specified by one of the error models or a violation of one of the performance metric models; and upon determining that a model check failed by identifying events correlated to the unexpected behavior in the dependency graph, identifying a root cause event related to the failed model check; and reporting an error based on the identified root cause and failed model check, wherein the preceding steps are performed by at least one processor.
地址 Redmond WA US