摘要 |
Some embodiments of the present invention provide a system that provides error detection and correction after a failure of a memory component in a memory system. During operation, the system accesses a block of data from the memory system, wherein the memory system is previously determined to have a specific failed memory component. Each block of data in the memory system includes an array of bits logically organized into R rows and C columns, including a row checkbit column including row-checkbits for each of the R rows, an inner checkbit column including X<R inner checkbits and R−X data bits, and C−2 data-bit columns containing data bits. Note that each column is stored in a different memory component, and the checkbits are generated from the data bits to provide block-level detection and correction for a failed memory component. Next, the system attempts to correct a column of the block from the failed memory component by using the checkbits and the data bits to produce a corrected column. The system then regenerates the row-parity bits and the inner checkbits for the block of data, wherein the block includes the corrected column, and compares the regenerated row-parity bits and inner checkbits with existing row-parity bits and inner checkbits. If the comparison indicates that there remains a double-bit error wherein both erroneous bits are in the same row and one is in the column associated with the failed component, the system flips the erroneous bits to correct the double-bit error.
|