发明名称 Confluence analysis and loop fast-forwarding for improving SIMD execution efficiency
摘要 One embodiment of the present invention sets forth a method for causing thread convergence. The method includes determining that a control flow graph representing a first section of a program includes at least two non-overlapping paths that extend from a first divergent node to a candidate node. The method also includes determining that the first divergent node is not a dominator of the candidate node or that the candidate node is not a post-dominator of the first divergent node. The method further includes identifying an external node and inserting a first instruction configured to cause a predicate variable to be set to true for a first set of threads that is to execute the external node. The method additionally includes inserting into the program a second divergent node configured to cause various threads to execute or not execute a first control flow path associated with the external node.
申请公布号 US9612811(B2) 申请公布日期 2017.04.04
申请号 US201414160426 申请日期 2014.01.21
申请人 NVIDIA Corporation 发明人 Sabne Amit Jayant;Lin Yuan;Grover Vinod
分类号 G06F9/45 主分类号 G06F9/45
代理机构 Artegis Law Group, LLP 代理人 Artegis Law Group, LLP
主权项 1. A method for causing thread convergence in a parallel execution environment, wherein the method is implemented by a compiler executing on a processor, the method comprising: determining, by the compiler executing on the processor, that a control flow graph representing a first section of a program includes at least two non-overlapping paths that extend from a first divergent node to a candidate node; determining, by the compiler executing on the processor, that the first divergent node is not a dominator of the candidate node or that the candidate node is not a post-dominator of the first divergent node; identifying, by the compiler executing on the processor, an external node in the control flow graph; inserting into the program, by the compiler executing on the processor, a first instruction configured to cause a predicate variable to be set to true for a first set of threads that is to execute the external node; and inserting into the program, by the compiler executing on the processor, a second divergent node that is configured to cause the first set of threads to execute a first control flow path associated with the external node and to cause a second set of threads for which the predicate variable is set to false to not execute the first control flow path.
地址 Santa Clara CA US