摘要 |
A multiprocessor computer system continues operation after the failure of a cooling device coupled to a central processing unit (CPU). In accordance with the present invention, an impending failure of a cooling device is detected, and all user and operating system processes are moved from the affected CPU coupled to the failing cooling device to one or more other CPUs. The system state is then altered so that interrupts are no longer received and processed by the affected CPU, and all memory caches associated with the affected CPU are flushed back to main memory to ensure cache coherency. At this point, the CPU is either powered-down, or placed in a low-power mode that allows the CPU to operate without the cooling device, while the processes that were removed from the suspended CPU continue executing on other CPUs. After the cooling device has been replaced and is operating normally, the CPU can be powered back up, interrupts can be enabled, and the CPU can once again execute user and operating system processes.
|