摘要 |
<p>When an atomic operation is to be executed for a thread group by an execution stage of a data processing system, it is determined 51 whether there is a set of threads in the thread group for which the atomic operation accesses the same memory location. If such a set is identified, the atomic operation for the set is performed 52 by: providing to the second thread in the set, the first threads register value for the atomic operation, performing for the second thread the arithmetic operation using the second threads register value and the first threads register value, and performing for each subsequent thread the arithmetic operation using the threads register value and the result of the arithmetic operation for the preceding thread, to generate for the final thread a combined result 47 of the arithmetic operation. A single atomic memory operation for the set is then executed 54 using the combined result as its register argument. The arithmetic operation for the first thread may be performed using an identity value for the arithmetic operation and the first thread's register value. The system may comprise wavefront or warp based GPU architecture, with a graphics processing pipeline including shader stages.</p> |