发明名称 Fault-tolerant computer system
摘要 A system and method for providing a fault-tolerant basis to execute instructions is disclosed. The system comprises an error detector, a rewriting module, a recovery engine, a fault locator and a fallback programming module. The error detector detects a first error in the execution of an instruction in a faulty stage unit of a first pipeline unit. The rewriting module rewrites the instruction to form a rewritten instruction responsive to detecting the first error. The recovery engine executes the rewritten instruction in the first pipeline unit. The error detector determines if a second error occurs in the execution of the rewritten instruction. Responsive to detecting the second error, the recovery engine selects a substitute stage unit for the faulty stage unit from a second pipeline unit. The fault locator locates a faulty component for the faulty stage unit. The fallback programming module establishes a fallback unit for the faulty component.
申请公布号 US8898516(B2) 申请公布日期 2014.11.25
申请号 US201113316314 申请日期 2011.12.09
申请人 Toyota Jidosha Kabushiki Kaisha 发明人 Honda Makoto;Hirano Kanji
分类号 G06F11/00 主分类号 G06F11/00
代理机构 Patent Law Works LLP 代理人 Patent Law Works LLP
主权项 1. A method for providing a fault-tolerant basis to execute instructions, the method comprising: detecting a first error in the execution of an instruction in a faulty stage unit of a first pipeline unit in a first core; rewriting the instruction to form a rewritten instruction in response to the detection of the first error; executing the rewritten instruction in the first pipeline unit; determining if a second error occurs in the execution of the rewritten instruction; responsive to determining the occurrence of the second error, selecting a first substitute stage unit for the faulty stage unit from a second pipeline unit in the first core, andre-executing the instruction in the first substitute stage unit of the second pipeline unit and non-faulty stage units of the first pipeline unit in the first core; determining whether a third error occurs in the execution of the instruction in the first substitute stage unit of the second pipeline unit and the non-faulty stage units of the first pipeline unit; responsive to detecting the third error, re-executing the instruction in a second substitute stage unit of a third pipeline unit in a second core; locating a faulty component for the faulty stage unit; and establishing a fallback unit for the faulty component.
地址 Toyota-shi, Aichi-ken JP