发明名称 Proactive failure recovery model for distributed computing using a checkpoint frequency determined by a MTBF threshold
摘要 This disclosure generally describes methods and systems, including computer-implemented methods, computer-program products, and computer systems, for providing a proactive failure recovery model for distributed computing. One computer-implemented method includes building a virtual tree-like computing structure of a plurality of computing nodes, for each computing node of the virtual tree-like computing structure, performing, by a hardware processor, a node failure prediction model to calculate a mean time between failure (MTBF) associated with the computing node, determining whether to perform a checkpoint of the computing node based on a comparison between the calculated MTBF and a maximum and minimum threshold, migrating a process from the computing node to a different computing node acting as a recovery node, and resuming execution of the process on the different computing node.
申请公布号 US9348710(B2) 申请公布日期 2016.05.24
申请号 US201414445369 申请日期 2014.07.29
申请人 Saudi Arabian Oil Company 发明人 Al-Wahabi Khalid S.
分类号 G06F11/00;G06F11/20;G06F11/07;G06F11/14;G06F11/34 主分类号 G06F11/00
代理机构 Fish & Richardson P.C. 代理人 Fish & Richardson P.C.
主权项 1. A computer-implemented method, comprising: building a virtual tree-like computing structure of a plurality of computing nodes; for each computing node of the virtual tree-like computing structure, performing, by a hardware processor, a node failure prediction model to calculate a mean time between failure (MTBF) associated with the computing node; determining whether to perform a checkpoint of the computing node based on a comparison between the calculated MTBF and a maximum and minimum threshold; migrating a process from the computing node to a different computing node acting as a recovery node; and resuming execution of the process on the different computing node.
地址 Dhahran SA