发明名称 Tolerating failures using concurrency in a cluster
摘要 A system, and computer program product for tolerating failures using concurrency in a cluster are provided in the illustrative embodiments. A failure is detected in a first computing node serving an application in a cluster. A subset of actions is selected from a set of actions, the set of actions configured to transfer the serving of the application from the first computing node to a second computing node in the cluster. A waiting period is set for the first computing node. The first computing node is allowed to continue serving the application during the waiting period. During the waiting period, concurrently with the first computing node serving the application, the subset of actions is performed at the second computing node. Responsive to receiving a signal of activity from the first computing node during the waiting period, the concurrent operation of the second computing node is aborted.
申请公布号 US9176833(B2) 申请公布日期 2015.11.03
申请号 US201313939928 申请日期 2013.07.11
申请人 GlobalFoundries U.S. 2 LLC 发明人 Griffith Douglas;Jaehde Angela Astrid;Ochs Matthew Ryan
分类号 G06F11/00;G06F11/20 主分类号 G06F11/00
代理机构 代理人
主权项 1. A computer usable program product comprising a computer usable storage device including computer usable code for tolerating failures using concurrency in a clustered data processing environment, the computer usable code comprising: computer usable code for detecting a failure in a first computing node, the first computing node serving an application in a cluster of computing nodes; computer usable code for selecting a subset of actions from a set of actions, the set of actions configured to transfer the serving of the application from the first computing node to a second computing node in the cluster; computer usable code for reordering the set of actions such that the subset of actions can be performed before a second subset of actions in the set of actions; computer usable code for setting a waiting period for the first computing node; computer usable code for allowing the first computing node to continue serving the application in the cluster during the waiting period; computer usable code for performing during the waiting period, concurrently with the first computing node serving the application, the subset of actions at the second computing node; and computer usable code for aborting, responsive to receiving a signal of activity from the first computing node during the waiting period, the concurrent operation of the second computing node in the cluster.
地址 Hopewell Junction NY US