发明名称 Concurrent repair of PCIE switch units in a tightly-coupled, multi-switch, multi-adapter, multi-host distributed system
摘要 Techniques are disclosed to perform an operation to facilitate concurrent repair of PCIe switch units in processing environments such as a tightly coupled, multi-switch, multi-adapter, multi-host distributed system. The operation, for an identified switch unit to be repaired, reconfigures all switch unit hardware in the switch fabric by removing all upstream to downstream connections utilizing the identified switch unit. Connections to hosts via the upstream ports are also removed by the operation. Once the switch unit is powered back on, the operation reconfigures all switch unit hardware in the switch fabric by adding all upstream to downstream connections utilizing the identified switch unit. The operation further restores connections to hosts via the upstream ports.
申请公布号 US8843688(B2) 申请公布日期 2014.09.23
申请号 US201213609561 申请日期 2012.09.11
申请人 International Business Machines Corporation 发明人 Engebretsen David R.;Holthaus Brian G.;Kaus Jonathan L.;Thiemann Eric G.;Todd Robert W.
分类号 G06F13/40;G06F9/44;G06F13/42 主分类号 G06F13/40
代理机构 Patterson & Sheridan, LLP 代理人 Patterson & Sheridan, LLP
主权项 1. A computer program product to repair switch units in a distributed switch comprising a plurality of switch units, each switch unit of the plurality having at least one port for establishing connections according to a predefined interface, the computer program product comprising: a non-transitory computer readable storage medium having computer-readable program code embodied therewith, the computer-readable program code comprising: computer-readable program code configured to, responsive to receiving an indication to vary off a first switch unit of the plurality of switch units: identify a host connected to the first switch unit;transmit a first removal indication to the host to remove a connection between the host and the first switch unit; andupon determining that: (i) the host is connected to the first switch unit through a downstream port of the first switch unit and (ii) the host has not acknowledged the first removal indication within a predefined amount of time, transmitting a second removal indication by operation of one or more computer processors when executing the computer-readable program code, wherein the second removal indication emulates that the first switch unit is physically removed from the distributed switch, wherein the first switch unit is not physically removed from the distributed switch; andcomputer-readable program code configured to, responsive to receiving an indication to vary on the first switch unit: transmit an add indication to the host to establish a connection between the host and the first switch unit.
地址 Armonk NY US