摘要 |
A distributed system provides error handling wherein the system includes multiple nodes, each node being coupled to multiple node controllers for control redundancy. Multiple system controllers couple to the node controllers via a network bus. A particular node controller may detect an error of that particular node controller. The particular node controller may store error information relating to the detected error in respective nonvolatile memory stores in the system controllers and node controllers according to a particular priority order. In accordance with the particular priority order, for example, the particular node controller may first attempt to store the error information to a primary system controller memory store, then to a secondary system controller memory store, and then to sibling and non-sibling node controller memory stores. The primary system controller organizes available error information for use by system administrators and other resources of the distributed system.
|