Confluence analysis and loop fast-forwarding for improving SIMD execution efficiency,申请号US201414160426-传众专利搜索

发明名称	Confluence analysis and loop fast-forwarding for improving SIMD execution efficiency
摘要	One embodiment of the present invention sets forth a method for causing thread convergence. The method includes determining that a control flow graph representing a first section of a program includes at least two non-overlapping paths that extend from a first divergent node to a candidate node. The method also includes determining that the first divergent node is not a dominator of the candidate node or that the candidate node is not a post-dominator of the first divergent node. The method further includes identifying an external node and inserting a first instruction configured to cause a predicate variable to be set to true for a first set of threads that is to execute the external node. The method additionally includes inserting into the program a second divergent node configured to cause various threads to execute or not execute a first control flow path associated with the external node.
申请公布号	US9612811(B2)	申请公布日期	2017.04.04
申请号	US201414160426	申请日期	2014.01.21
申请人	NVIDIA Corporation	发明人	Sabne Amit Jayant;Lin Yuan;Grover Vinod
分类号	G06F9/45	主分类号	G06F9/45
代理机构	Artegis Law Group, LLP	代理人	Artegis Law Group, LLP
主权项	1. A method for causing thread convergence in a parallel execution environment, wherein the method is implemented by a compiler executing on a processor, the method comprising: determining, by the compiler executing on the processor, that a control flow graph representing a first section of a program includes at least two non-overlapping paths that extend from a first divergent node to a candidate node; determining, by the compiler executing on the processor, that the first divergent node is not a dominator of the candidate node or that the candidate node is not a post-dominator of the first divergent node; identifying, by the compiler executing on the processor, an external node in the control flow graph; inserting into the program, by the compiler executing on the processor, a first instruction configured to cause a predicate variable to be set to true for a first set of threads that is to execute the external node; and inserting into the program, by the compiler executing on the processor, a second divergent node that is configured to cause the first set of threads to execute a first control flow path associated with the external node and to cause a second set of threads for which the predicate variable is set to false to not execute the first control flow path.
地址	Santa Clara CA US