摘要 |
Fatal errors are uncorrectable errors in hardware, which cause entire applications to be restarted and at worst can cause machine reboots. A method of recovering from a fatal error in a system having a plurality of components, in which the system includes a processor for executing a plurality of processes, comprises detecting an error in the system, determining which of the components caused the error, isolating processes affected by the error and recovering from the error. Assistance in error recovery can be provided by designing processes using check pointing, in which a back up of data pages is taken at predetermined points in a process, so that minimal loss of transactions occurs in the case of a fatal error.
|