发明名称 Increasing resiliency of a distributed computing system through lifeboat monitoring
摘要 A method and associated systems for increasing resiliency of a distributed computing system. A processor creates a first virtual machine and a second virtual machine, each of which monitor the other. When the first virtual machine identifies that the second virtual machine has become unavailable or is otherwise compromised, the first virtual machine automatically requests that a system-management entity restart the unavailable machine. If a certain number of restart attempts fails to restore the second virtual machine to desired functionality, the first virtual machine automatically requests that the system-management entity recreate or reprovision the second virtual machine from a prior backup. If a certain number of such recreations or reprovisionings attempts fails, a system administrator is automatically notified that further action is needed.
申请公布号 US9087005(B2) 申请公布日期 2015.07.21
申请号 US201313906482 申请日期 2013.05.31
申请人 International Business Machines Corporation 发明人 Chen Han;Frank Joachim H.;Lei Hui;Maximilien E. Michael;Yang Lin
分类号 G06F11/00;G06F11/14;G06F11/07 主分类号 G06F11/00
代理机构 Schmeiser, Olsen & Watts, LLP 代理人 Schmeiser, Olsen & Watts, LLP ;Pivnichny John
主权项 1. A method for increasing resiliency of a distributed computing system through lifeboat monitoring, said method comprising: a processor of a computer system provisioning a first virtual machine on a first platform and further provisioning a second virtual machine on a second platform that is distinct from the first platform, wherein: a first agent runs on the first virtual machine and a second agent runs on the second virtual machine,the second agent monitors a first operation of the first virtual machine, andthe first agent monitors a second operation of the second virtual machine; the processor receiving notice from the first agent that the second virtual machine is not responsive; the processor taking steps to restart the second virtual machine; the processor, if the steps fail to restart the second virtual machine, receiving further notice from the first agent that the restarting has failed; the processor, in response to the further notice, taking further steps to recreate the second virtual machine; the processor receiving additional notice from the first agent that the recreating has failed; and the processor alerting a system administrator that the recreating has failed.
地址 Armonk NY US