Partitioning CUDA code for execution by a general purpose processor,申请号US200912415075-传众专利搜索

发明名称	Partitioning CUDA code for execution by a general purpose processor
摘要	One embodiment of the present invention sets forth a technique for translating application programs written using a parallel programming model for execution on multi-core graphics processing unit (GPU) for execution by general purpose central processing unit (CPU). Portions of the application program that rely on specific features of the multi-core GPU are converted by a translator for execution by a general purpose CPU. The application program is partitioned into regions of synchronization independent instructions. The instructions are classified as convergent or divergent and divergent memory references that are shared between regions are replicated. Thread loops are inserted to ensure correct sharing of memory between various threads during execution by the general purpose CPU.
申请公布号	US8776030(B2)	申请公布日期	2014.07.08
申请号	US200912415075	申请日期	2009.03.31
申请人	NVIDIA Corporation	发明人	Grover Vinod;Aarts Bastiaan Joannes Matheus;Murphy Michael
分类号	G06F9/44	主分类号	G06F9/44
代理机构	Patterson & Sheridan, LLP	代理人	Patterson & Sheridan, LLP
主权项	1. A computer-implemented method for partitioning an application program that comprises a plurality of statements, the method comprising: selecting a statement in the plurality of statements to analyze; if the statement is not a synchronization barrier instruction, then adding the statement to a current partition, or if the statement is a synchronization barrier instruction or if the statement is a start of a control-flow construct that includes a synchronization barrier instruction, then ending the current partition and storing the current partition in an output list of partitions; beginning a new current partition and repeating the steps of selecting, adding, and ending until all statements in the plurality of statements have been analyzed; annotating each statement in the application program with a corresponding variance vector that is a representation of a set configured to indicate thread dimensions on which the statement depends; and reordering statements in a partition in the output list of partitions to cause statements in the partition that have fewer dimensions in their corresponding variance vectors to precede statements in the partition that have more dimensions in their corresponding variance vectors.
地址	Santa Clara CA US

您可能感兴趣的专利