主权项 |
1. A method for implementing recovery segments in a large scale computing application comprising:
sending an application message from a parent process executed by a first computing device to a child process executed by a second computing device, in which the recovery segment comprises the parent process and the child process; identifying a dependency created by the application message; including the identified dependency in a dependence set of the child process and saving the dependence set in memory of the second computing device; generating, by the parent process, a first checkpoint and saving the first checkpoint in nonvolatile memory of the first computing device; sending, from the parent process to a child process, a checkpoint message that includes dependency information; receiving, by the child process, the checkpoint message and modifying the dependence set of the child process according to the dependency information; generating, by the child process, a second checkpoint and saving the second checkpoint in nonvolatile memory of the second computing device; upon occurrence of a failure of the parent process, reverting the child process to a most recent checkpoint generated by the child process that does not include effects of processing an orphan message. |