发明名称 |
Reestablishing synchronization in a memory system |
摘要 |
Embodiments relate to reestablishing synchronization across multiple channels in a memory system. One aspect is a computer implemented method that includes receiving an out-of-synchronization indication associated with at least one of a plurality of channels in the memory system. A memory control unit in communication with the channels performs a first stage of reestablishing synchronization that includes selectively stopping new traffic on the plurality of channels, waiting for a first time period to expire, resuming traffic on the plurality of channels based on the first time period expiring, and verifying that synchronization is reestablished for a second time period. |
申请公布号 |
US9594646(B2) |
申请公布日期 |
2017.03.14 |
申请号 |
US201615252435 |
申请日期 |
2016.08.31 |
申请人 |
INTERNATIONAL BUSINESS MACHINES CORPORATION |
发明人 |
Gilda Glenn D.;Meaney Patrick J.;Papazova Vesselina K.;Dodson John S. |
分类号 |
G06F1/12;G06F11/16;G06F3/06;G06F13/16;G06F1/04;G06F1/10 |
主分类号 |
G06F1/12 |
代理机构 |
Cantor Colburn LLP |
代理人 |
Cantor Colburn LLP ;McNamara Margaret A. |
主权项 |
1. A computer-implemented method for reestablishing synchronization across multiple channels in a memory system within a computer, the method comprising:
receiving an out-of-synchronization indication associated with at least one of a plurality of channels in the memory system within the computer, wherein the plurality of channels within the computer each provide communication with a memory buffer chip and a plurality of memory devices; performing, by a memory control unit in communication with the channels within the computer, a first stage of reestablishing synchronization comprising:
selectively stopping new traffic on the plurality of channels unless a skip counter is enabled and not exceeded;decrementing a first time period while waiting for the first time period to expire based on determining that a replay is not in progress;resuming traffic on the plurality of channels based on the first time period expiring;verifying that synchronization is reestablished for a second time period; andbased on determining that synchronization is not reestablished for the second time period, repeating the first stage of reestablishing synchronization for a number of times before performing a second stage of reestablishing synchronization; performing, by the memory control unit, the second stage of reestablishing synchronization comprising:
stopping new traffic on the plurality of channels;waiting for outstanding traffic on the plurality of channels to complete;waiting for a write reorder queue empty status indicator from the memory buffer chips;decrementing the first time period while waiting for the first time period to expire based on determining that the replay is not in progress, wherein the replay comprises a recovery retransmission sequence from a replay buffer that causes a faulty channel to go out of synchronization with non-faulty instances of the channels;resuming traffic on the plurality of channels based on the first time period expiring;verifying that synchronization is reestablished for the second time period;based on determining that synchronization is not reestablished for the second time period, determining whether a memory buffer chip out-of-sync condition exists; andbased on determining that the memory buffer chip out-of-sync condition does not exist, repeating the second stage of reestablishing synchronization for a number of times before advancing to a third stage of reestablishing synchronization or declaring a failure; and performing the third stage of reestablishing synchronization based on determining that the memory buffer chip out-of-sync condition exists, the third stage comprising:
stopping new traffic on the plurality of channels;waiting for outstanding traffic on the plurality of channels to complete;waiting for the write reorder queue empty status indicator from the memory buffer chips before sending a synchronization command;sending the synchronization command to the memory buffer chips on each of the channels;waiting for a third time period to expire;verifying that the replay did not occur during the third time period;waiting a fourth time period before resuming traffic on the plurality of channels;resuming traffic on the plurality of channels;verifying that synchronization is reestablished for the second time period; andbased on determining that synchronization is not reestablished for the second time period, repeating the third stage of reestablishing synchronization for a number of times before declaring the failure. |
地址 |
Armonk NY US |