摘要 |
A method for fault tolerance and fault recovery in multiprocessor systems that concurrently manage queues is disclosed. The illustrative embodiment comprises a plurality of servers, a queue of jobs to be assigned to the servers, and two queue managers—a primary unit and a secondary unit—such that the secondary fills in for the primary unit while the primary unit is down. The illustrative embodiment provides for smooth transitions from the normal state into the failure state and back into the normal state without losing jobs or violating the queue discipline of the system.
|