发明名称 |
CONFLUENCE ANALYSIS AND LOOP FAST-FORWARDING FOR IMPROVING SIMD EXECUTION EFFICIENCY |
摘要 |
One embodiment of the present invention sets forth a method for causing thread convergence. The method includes determining that a control flow graph representing a first section of a program includes at least two non-overlapping paths that extend from a first divergent node to a candidate node. The method also includes determining that the first divergent node is not a dominator of the candidate node or that the candidate node is not a post-dominator of the first divergent node. The method further includes identifying an external node and inserting a first instruction configured to cause a predicate variable to be set to true for a first set of threads that is to execute the external node. The method additionally includes inserting into the program a second divergent node configured to cause various threads to execute or not execute a first control flow path associated with the external node. |
申请公布号 |
US2015205590(A1) |
申请公布日期 |
2015.07.23 |
申请号 |
US201414160426 |
申请日期 |
2014.01.21 |
申请人 |
NVIDIA CORPORATION |
发明人 |
SABNE Amit Jayant;LIN Yuan;GROVER Vinod |
分类号 |
G06F9/45 |
主分类号 |
G06F9/45 |
代理机构 |
|
代理人 |
|
主权项 |
1. A computer-implemented method for causing thread convergence in a parallel execution environment, the method comprising:
determining that a control flow graph representing a first section of a program includes at least two non-overlapping paths that extend from a first divergent node to a candidate node; determining that the first divergent node is not a dominator of the candidate node or that the candidate node is not a post-dominator of the first divergent node; identifying an external node in the control flow graph; inserting into the program a first instruction configured to cause a predicate variable to be set to true for a first set of threads that is to execute the external node; and inserting into the program a second divergent node that is configured to cause the first set of threads to execute a first control flow path associated with the external node and to cause a second set of threads for which the predicate variable is set to false to not execute the first control flow path. |
地址 |
Santa Clara CA US |