摘要 |
An intelligent system for automatically monitoring, diagnosing, and repairing complex hardware and software systems is presented. A number of functional modules enable the system to collect relevant data from both hardware and software components, analyze the incoming data to detect faults, further monitor sensor data and historical knowledge to predict potential faults, determine an appropriate response to fix the faults, and finally automatically repair the faults when appropriate. The system leverages both software and hardware modules to interact with the complex system being monitored. Additionally, the lessons learned on one system can be applied to better understand events occurring on the same or similar systems. |