发明名称 FAULT RECOVERY ON A MASSIVELY PARALLEL COMPUTER SYSTEM TO HANDLE NODE FAILURES WITHOUT ENDING AN EXECUTING JOB
摘要 A method and apparatus for fault recovery of on a parallel computer system from a soft failure without endingan executing job on a partition of nodes. In preferred embodiments a failed hardware recovery mechanism on a service node uses a heartbeat monitor to determine when a node failure occurs. Where possible, the failed node is reset and re-loaded with software without ending the software job being executed by the partition containing the failed node. ® KIPO & WIPO 2009
申请公布号 KR20090084897(A) 申请公布日期 2009.08.05
申请号 KR20097010832 申请日期 2008.02.01
申请人 INTERNATIONAL BUSINESS MACHINES CORPORATION 发明人 DARRINGTON DAVID;MCCARTHY PATRICK JOSEPH;PETERS AMANDA;SIDELNIK ALBERT
分类号 G06F13/00;G06F1/24;G06F11/16;G06F15/00 主分类号 G06F13/00
代理机构 代理人
主权项
地址
您可能感兴趣的专利