发明名称 System and method for hardware scheduling of conditional barriers and impatient barriers
摘要 A method and a system are provided for hardware scheduling of barrier instructions. Execution of a plurality of threads to process instructions of a program that includes a barrier instruction is initiated, and when each thread reaches the barrier instruction during execution of program, it is determined whether the thread participates in the barrier instruction. The threads that participate in the barrier instruction are then serially executed to process one or more instructions of the program that follow the barrier instruction. A method and system are also provided for impatient scheduling of barrier instructions. When a portion of the threads that is greater than a minimum number of threads and less than all of the threads in the plurality of threads reaches the barrier instruction each of the threads in the portion is serially executed to process one or more instructions of the program that follow the barrier instruction.
申请公布号 US9448803(B2) 申请公布日期 2016.09.20
申请号 US201313794578 申请日期 2013.03.11
申请人 NVIDIA Corporation 发明人 Lindholm John Erik;Karras Tero Tapani;Aila Timo Oskari;Laine Samuli Matias
分类号 G06F9/30;G06F9/38;G06F9/48;G06F9/52 主分类号 G06F9/30
代理机构 Zilka-Kotab, PC 代理人 Zilka-Kotab, PC
主权项 1. A method comprising: initiating execution of a plurality of threads to process instructions of a program that includes a barrier instruction; for each thread in the plurality of threads, determining whether the thread participates in the barrier instruction when the thread reaches the barrier instruction during execution of the thread; prior to executing each of the threads that participate in the barrier instruction, waiting for a portion of the threads that participate in the barrier instruction to reach the barrier instruction and for conditions that allow the barrier instruction to be scheduled as an first barrier to be met; and serially executing each of the threads that participate in the barrier instruction to process one or more instructions of the program that follow the barrier instruction, wherein during execution of each of the threads that participate in the barrier instruction an additional thread is identified as a late arriving thread that participates in the barrier instruction and that is not included in the portion of the threads and the additional thread is executed to process the one or more instructions of the program that follow the barrier instruction.
地址 Santa Clara CA US