发明名称 Reestablishing synchronization in a memory system
摘要 Embodiments relate to reestablishing synchronization across multiple channels in a memory system. One aspect is a computer implemented method that includes receiving an out-of-synchronization indication associated with at least one of a plurality of channels in the memory system. A memory control unit in communication with the channels performs a first stage of reestablishing synchronization that includes selectively stopping new traffic on the plurality of channels, waiting for a first time period to expire, resuming traffic on the plurality of channels based on the first time period expiring, and verifying that synchronization is reestablished for a second time period.
申请公布号 US9594646(B2) 申请公布日期 2017.03.14
申请号 US201615252435 申请日期 2016.08.31
申请人 INTERNATIONAL BUSINESS MACHINES CORPORATION 发明人 Gilda Glenn D.;Meaney Patrick J.;Papazova Vesselina K.;Dodson John S.
分类号 G06F1/12;G06F11/16;G06F3/06;G06F13/16;G06F1/04;G06F1/10 主分类号 G06F1/12
代理机构 Cantor Colburn LLP 代理人 Cantor Colburn LLP ;McNamara Margaret A.
主权项 1. A computer-implemented method for reestablishing synchronization across multiple channels in a memory system within a computer, the method comprising: receiving an out-of-synchronization indication associated with at least one of a plurality of channels in the memory system within the computer, wherein the plurality of channels within the computer each provide communication with a memory buffer chip and a plurality of memory devices; performing, by a memory control unit in communication with the channels within the computer, a first stage of reestablishing synchronization comprising: selectively stopping new traffic on the plurality of channels unless a skip counter is enabled and not exceeded;decrementing a first time period while waiting for the first time period to expire based on determining that a replay is not in progress;resuming traffic on the plurality of channels based on the first time period expiring;verifying that synchronization is reestablished for a second time period; andbased on determining that synchronization is not reestablished for the second time period, repeating the first stage of reestablishing synchronization for a number of times before performing a second stage of reestablishing synchronization; performing, by the memory control unit, the second stage of reestablishing synchronization comprising: stopping new traffic on the plurality of channels;waiting for outstanding traffic on the plurality of channels to complete;waiting for a write reorder queue empty status indicator from the memory buffer chips;decrementing the first time period while waiting for the first time period to expire based on determining that the replay is not in progress, wherein the replay comprises a recovery retransmission sequence from a replay buffer that causes a faulty channel to go out of synchronization with non-faulty instances of the channels;resuming traffic on the plurality of channels based on the first time period expiring;verifying that synchronization is reestablished for the second time period;based on determining that synchronization is not reestablished for the second time period, determining whether a memory buffer chip out-of-sync condition exists; andbased on determining that the memory buffer chip out-of-sync condition does not exist, repeating the second stage of reestablishing synchronization for a number of times before advancing to a third stage of reestablishing synchronization or declaring a failure; and performing the third stage of reestablishing synchronization based on determining that the memory buffer chip out-of-sync condition exists, the third stage comprising: stopping new traffic on the plurality of channels;waiting for outstanding traffic on the plurality of channels to complete;waiting for the write reorder queue empty status indicator from the memory buffer chips before sending a synchronization command;sending the synchronization command to the memory buffer chips on each of the channels;waiting for a third time period to expire;verifying that the replay did not occur during the third time period;waiting a fourth time period before resuming traffic on the plurality of channels;resuming traffic on the plurality of channels;verifying that synchronization is reestablished for the second time period; andbased on determining that synchronization is not reestablished for the second time period, repeating the third stage of reestablishing synchronization for a number of times before declaring the failure.
地址 Armonk NY US