发明名称 RISK INDICES FOR ENHANCED THROUGHPUT IN COMPUTING SYSTEMS
摘要 Embodiments of a system that adjusts a checkpointing frequency in a distributed computing system that executes multiple jobs are described. During operation, the system receives signals associated with the operation of the computing nodes. Then, the system determines risk metrics for the computing nodes using a pattern-recognition technique to identify anomalous signals in the received signals. Next, the system adjusts a checkpointing frequency of a given checkpoint for a given computing node based on a comparison of a risk metric associated with the given computing node and a threshold, thereby implementing holistic fault tolerance, in which prediction and prevention of potential faults occurs across the distributed computing system.
申请公布号 US2010011254(A1) 申请公布日期 2010.01.14
申请号 US20080170239 申请日期 2008.07.09
申请人 SUN MICROSYSTEMS, INC. 发明人 VOTTA LAWRENCE G.;WHISNANT KEITH A.;GROSS KENNY C.
分类号 G06F11/34 主分类号 G06F11/34
代理机构 代理人
主权项
地址