摘要 |
<p>A hierarchical, distributed Availability Management (AM) process for recovering from component failures in a data processing system. The hierarchy of AM elements track a failure modality hierarchy of the data processing system components. Each AM element is responsible for receiving failure notifications form processing system components associated with a next of lower level of the hierarchy. Upon such indication, if the AM element determines that the failed component may be restarted, it determines whether it can be a hot, warm, or cold restart. One of the AM processes may executes as an identity management protocol. The identity protocol sets a temporary master state, waits a predetermined amount of time; and then sets a final master state only if no other system card has asserted a temporary master state. The waiting time period is selected to be grater than the longest expected initialization process for peer components in the system.</p> |