摘要 |
A probe mechanism detects failed software components in a running software system. The probe mechanism is based on requesting service, or a certain level of service, from a set of functions, modules and/or subsystems and checking the response to the request. The probe is directed towards a service rendered by a collection of software modules and functions which is termed a target. The probe will then label a target as either healthy or failed. The objective is to localize the failure only up to the level of a target, however, and achieve a high degree of efficiency and confidence in the process. Targets are chosen such that they represent a collection of functions that can be defined by a service level input/output (I/O) specification. Targets can be identified at different levels or layers in the software. The choice of a level is based on the granularity of fault detection that is desired, taken in consideration with the level at which recovery can be implemented. To further enhance the operation of the probe, it is made self testing against any single failure in its operational components. The technique to make the probe self testing uses the probe paradigm to define a null probe which tests the probe dispatcher and creates a null failure which in turn tests the probe analyzer. The probe mechanism may be implemented on either a signle or multiple computer system. |