发明名称 Dynamically updating routing information while avoiding deadlocks and preserving packet order after a link error
摘要 A system for allowing dynamic changing of routing information of a network interconnect while avoiding deadlocks and preserving packet ordering. A network resiliency system detects when an error in the network interconnect occurs and dynamically generates new routing information for the routers that factors in the detected error. The network resiliency system then generates new routing information that factors in the failure. The network resiliency system then directs the network interconnect to enter a quiescent state in which no packets are transiting through the network interconnect. After the network interconnect enters the quiescent state, the network resiliency system directs the loading of the new routing information into the routing tables of the network interconnect and then directs the network interconnect to start injecting request packets into the network interconnect.
申请公布号 US8854951(B2) 申请公布日期 2014.10.07
申请号 US201113104778 申请日期 2011.05.10
申请人 Cray Inc. 发明人 Godfrey Aaron F.;Johns Christopher B.
分类号 G01R31/08;H04L12/757;H04L12/24 主分类号 G01R31/08
代理机构 Perkins Coie LLP 代理人 Perkins Coie LLP
主权项 1. A method for recovering from a failure in a network interconnect having processors connected via routers, the method comprising: upon detecting a failure, requesting a compute routing information component to compute new routing information for the network interconnect;requesting a quiesce component to suppress the injecting of request packets into the network interconnect while allowing response packets to be injected into the network interconnect and in-transit request packets to continue through the network interconnect; andafter the injection of request packets is suppressed, waiting for the network interconnect to enter a quiescent state in which neither request packets nor response packets are in transit through the network interconnect; andafter the network interconnect enters the quiescent state, requesting an install routing information component to install the new routing information into the network interconnect; andafter the new routing information is installed into the network interconnect, requesting an unquiesce component to allow the injecting of request packets into the network interconnectwherein each processor is connected to a router through a network interface controller, each processor is connected to a local controller through a network that is out-of-band from the network interconnect, and the suppressing of the injecting of request packets includes the local controller notifying a program executing on the processor to not send requests for injection of request packets into the network interconnect.
地址 Seattle WA US