摘要 |
Systems and methods for implementing recovery processes on failed nodes in a distributed computing environment are described. In accordance with this scheme, one or more migratory recovery modules are launched into the network. The recovery modules migrate from node to node, determine the status of each node, and initiate recovery processes on failed nodes. In this way, scalable recovery processes may be implemented in distributed systems, even with incomplete network topology and membership information. In addition, the complexity and cost associated with manual status monitoring and recovery operations may be avoided.
|