发明名称 |
Proactive failure recovery model for distributed computing using a checkpoint frequency determined by a MTBF threshold |
摘要 |
This disclosure generally describes methods and systems, including computer-implemented methods, computer-program products, and computer systems, for providing a proactive failure recovery model for distributed computing. One computer-implemented method includes building a virtual tree-like computing structure of a plurality of computing nodes, for each computing node of the virtual tree-like computing structure, performing, by a hardware processor, a node failure prediction model to calculate a mean time between failure (MTBF) associated with the computing node, determining whether to perform a checkpoint of the computing node based on a comparison between the calculated MTBF and a maximum and minimum threshold, migrating a process from the computing node to a different computing node acting as a recovery node, and resuming execution of the process on the different computing node. |
申请公布号 |
US9348710(B2) |
申请公布日期 |
2016.05.24 |
申请号 |
US201414445369 |
申请日期 |
2014.07.29 |
申请人 |
Saudi Arabian Oil Company |
发明人 |
Al-Wahabi Khalid S. |
分类号 |
G06F11/00;G06F11/20;G06F11/07;G06F11/14;G06F11/34 |
主分类号 |
G06F11/00 |
代理机构 |
Fish & Richardson P.C. |
代理人 |
Fish & Richardson P.C. |
主权项 |
1. A computer-implemented method, comprising:
building a virtual tree-like computing structure of a plurality of computing nodes; for each computing node of the virtual tree-like computing structure, performing, by a hardware processor, a node failure prediction model to calculate a mean time between failure (MTBF) associated with the computing node; determining whether to perform a checkpoint of the computing node based on a comparison between the calculated MTBF and a maximum and minimum threshold; migrating a process from the computing node to a different computing node acting as a recovery node; and resuming execution of the process on the different computing node. |
地址 |
Dhahran SA |