发明名称 Message flow control in a multi-node computer system
摘要 Embodiments of the invention provide for controlling message flow across a parallel computer system having multiple compute nodes by selectively grouping compute nodes of such a system into node pools and assigning message flow control policies to nodes in the node pools. The message flow control policies specify logging and/or tracing activities to be performed by instances of applications running on nodes assigned to the node pools. As the application is executed, logging and/or tracing messages are generated on the compute nodes according to message flow control policies assigned to the nodes. Optionally, the message flow is analyzed, the message flow control policies are adjusted, and duplicate messages are eliminated.
申请公布号 US9514023(B2) 申请公布日期 2016.12.06
申请号 US200812144783 申请日期 2008.06.24
申请人 INTERNATIONAL BUSINESS MACHINES CORPORATION 发明人 Barsness Eric L.;Darrington David L.;Peters Amanda;Santosuosso John M.
分类号 G06F15/16;G06F11/34;G06F11/07 主分类号 G06F15/16
代理机构 Patterson + Sheridan, LLP 代理人 Patterson + Sheridan, LLP
主权项 1. A computer-implemented method for controlling message flow in a parallel computing system having a plurality of compute nodes, the method comprising: assigning a first set of compute nodes to a first node pool; assigning a first message flow control policy to at least two compute nodes of the first node pool, wherein the first message flow control policy specifies at least one logging activity to be performed by an instance of an application running on each of the at least two compute nodes of the first node pool, and wherein subsequent modifications to the assigned first message flow control policy affect one or more of the at least one logging activities performed by each instance of the application running on the at least two compute nodes; initiating execution of the application on each of the compute nodes in the first node pool; while executing the application on the at least two compute nodes of the first node pool, generating a plurality of logging messages according to the first message flow control policy; and upon determining that two or more of the at least two compute nodes of the first node pool are generating duplicate error messages based on content of the plurality of logging messages: assigning a selected one of the two or more compute nodes to a second node pool; andassigning a second message flow control policy corresponding to the second node pool to the selected compute node, wherein the second message flow control policy is distinct from the first message flow control policy, and wherein logging activity performed by the instance of the application running on the selected compute node is controlled by the second message flow control policy rather than the first message flow control policy.
地址 Armonk NY US