发明名称 CONFLUENCE ANALYSIS AND LOOP FAST-FORWARDING FOR IMPROVING SIMD EXECUTION EFFICIENCY
摘要 One embodiment of the present invention sets forth a method for causing thread convergence. The method includes determining that a control flow graph representing a first section of a program includes at least two non-overlapping paths that extend from a first divergent node to a candidate node. The method also includes determining that the first divergent node is not a dominator of the candidate node or that the candidate node is not a post-dominator of the first divergent node. The method further includes identifying an external node and inserting a first instruction configured to cause a predicate variable to be set to true for a first set of threads that is to execute the external node. The method additionally includes inserting into the program a second divergent node configured to cause various threads to execute or not execute a first control flow path associated with the external node.
申请公布号 US2015205590(A1) 申请公布日期 2015.07.23
申请号 US201414160426 申请日期 2014.01.21
申请人 NVIDIA CORPORATION 发明人 SABNE Amit Jayant;LIN Yuan;GROVER Vinod
分类号 G06F9/45 主分类号 G06F9/45
代理机构 代理人
主权项 1. A computer-implemented method for causing thread convergence in a parallel execution environment, the method comprising: determining that a control flow graph representing a first section of a program includes at least two non-overlapping paths that extend from a first divergent node to a candidate node; determining that the first divergent node is not a dominator of the candidate node or that the candidate node is not a post-dominator of the first divergent node; identifying an external node in the control flow graph; inserting into the program a first instruction configured to cause a predicate variable to be set to true for a first set of threads that is to execute the external node; and inserting into the program a second divergent node that is configured to cause the first set of threads to execute a first control flow path associated with the external node and to cause a second set of threads for which the predicate variable is set to false to not execute the first control flow path.
地址 Santa Clara CA US