摘要 |
Parallel data processing systems and methods use cooperative thread arrays (CTAs), i.e., groups of multiple threads that concurrently execute the same program on an input data set to produce an output data set. Each thread in a CTA has a unique identifier (thread ID) that can be assigned at thread launch time and that controls various aspects of the thread's processing behavior, such as the portion of the input data set to be processed by each thread, the portion of the output data set to be produced by each thread, and/or sharing of intermediate results among threads. Where groups of threads are executed in SIMD parallelism, thread IDs for threads in the same SIMD group are generated and assigned in parallel, allowing different SIMD groups to be launched in rapid succession.
|