摘要 |
The objective of the present invention is to reduce the workload of an operator as pertains to re-execution, when a failure occurs in a plurality of jobs that are executed in parallel using a shared file. When a shared file is used and a jobnet including a plurality of jobs that are executed in parallel is executed, a shared file-determining unit determines whether or not a file used by the jobs is a shared file, a checkpoint-managing unit sets a checkpoint when the jobs write data into a file determined to be a shared file, a file copy-processing unit creates a duplicate of the shared file used by the jobs, and a process copy-processing unit creates a duplicate of a process of the jobs. When a fault-state-detecting unit detects a fault state in a job being executed, the checkpoint-managing unit determines a checkpoint for restarting the reprocessing of the job and restarts the job using the duplicate of the process and the duplicate of the shared file that was created when the determined checkpoint was set. |