发明名称 Fault containment and error recovery in a scalable multiprocessor
摘要 A multi-processor computer system permits various types of partitions to be implemented to contain and isolate hardware failures. The various types of partitions include hard, semi-hard, firm, and soft partitions. Each partition can include one or more processors. Upon detecting a failure associated with a processor, the connection to adjacent processors in the system can be severed, thereby precluding corrupted data from contaminating the rest of the system. If an inter-processor connection is severed, message traffic in the system can become congested as messages become backed up in other processors. Accordingly, each processor includes various timers to monitor for traffic congestion that may be due to a severed connection. Rather than letting the processor continue to wait to be able to transmit its messages, the timers will expire at preprogrammed time periods and the processor will take appropriate action, such as simply dropping queued messages, to keep the system from locking up.
申请公布号 US6678840(B1) 申请公布日期 2004.01.13
申请号 US20000651949 申请日期 2000.08.31
申请人 HEWLETT-PACKARD DEVELOPMENT COMPANY, LP. 发明人 KESSLER RICHARD E.;BANNON PETER J.;GHARACHORLOO KOUROSH;VERGHESE THUKALAN V.
分类号 G06F11/00;G06F15/00;H04L1/00;(IPC1-7):G06F11/00 主分类号 G06F11/00
代理机构 代理人
主权项
地址