发明名称 RESILIENCY TO MEMORY FAILURES IN COMPUTER SYSTEMS
摘要 A resiliency system detects and corrects memory errors reported by a memory system of a computing system using previously stored error correction information. When a program stores data into a memory location, the resiliency system executing on the computing system generates and stores error correction information. When the program then executes a load instruction to retrieve the data from the memory location, the load instruction completes normally if there is no memory error. If, however, there is a memory error, the computing system passes control to the resiliency system (e.g., via a trap) to handle the memory error. The resiliency system retrieves the error correction information for the memory location and re-creates the data of the memory location. The resiliency system stores the data as if the load instruction had completed normally and passes control to the next instruction of the program.
申请公布号 US2017068596(A1) 申请公布日期 2017.03.09
申请号 US201615357448 申请日期 2016.11.21
申请人 Cray Inc. 发明人 Kaplan Laurence S.;Briggs, III Preston Pengra;Ohlrich Miles Arthur;Leslie Willard Huston
分类号 G06F11/10;G06F3/06 主分类号 G06F11/10
代理机构 代理人
主权项 1. A computer-readable storage medium containing computer-executable instructions of an application program interface for providing resiliency to memory accesses of an application program, the instructions comprising: a segment register component that registers a segment of memory that is to be resilient for the application program, the registered segment having a segment descriptor indicating number of data words, number of check words, size of a check group, location of the data words, and location of the check words; a segment reference component that maps the registered segment into the address space of the application program and registers a re-create data word component to process memory errors that occur when the application program accesses the registered segment; and a segment write component that stores a data word in the registered segment by generating a check word for the data word, storing the generated check word, and storing the data word in the registered segment.
地址 Seattle WA US