发明名称 DATA PROCESSING SYSTEMS
摘要 When an atomic operation is to be executed for a thread group by an execution stage of a data processing system, it is determined whether there is a set of threads for which the atomic operation for the threads accesses the same memory location. If so, the arithmetic operation for the atomic operation is performed for the first thread in the set of threads using an identity value for the arithmetic operation for the atomic operation and the first thread's register value for the atomic operation, and is performed for each other thread in the set of threads using the thread's register value for the atomic operation and the result of the arithmetic operation for the preceding thread in the set of threads, to thereby generate for the final thread in the identified set of threads a combined result of the arithmetic operation for the set of threads.
申请公布号 US2014366033(A1) 申请公布日期 2014.12.11
申请号 US201313913334 申请日期 2013.06.07
申请人 ARM Limited 发明人 Nystad Jorn
分类号 G06F9/50 主分类号 G06F9/50
代理机构 代理人
主权项 1. A method of operating a data processing system which includes an execution pipeline that includes one or more programmable execution stages which execute instructions to perform data processing operations, and in which execution threads may be grouped together into thread groups in which the threads of the group are executed in lockstep, one instruction at a time, the method comprising: for an atomic operation to be executed for a thread group by an execution stage of the execution pipeline, the atomic operation having an associated arithmetic operation: issuing to the execution stage an instruction or instructions to determine whether there is a set of threads in the thread group for which the atomic operation for the threads accesses the same memory location; and to, if such a set of threads is identified, perform the atomic operation for the set of threads by: providing to the second thread in the set of threads, the first thread's register value for the atomic operation, performing for the second thread in the set of threads the arithmetic operation for the atomic operation using the second thread's register value for the atomic operation and the first thread's register value for the atomic operation, and performing for each thread in the set of threads other than the first and second threads, if any, the arithmetic operation for the atomic operation using the thread's register value for the atomic operation and the result of the arithmetic operation for the preceding thread in the set of threads, to thereby generate for the final thread in the identified set of threads a combined result of the arithmetic operation for the set of threads; and then executing, for the identified set of threads, a single atomic memory operation to the memory location for the atomic operation for the set of threads using the combined result of the arithmetic operation for the set of threads as its register argument; and the execution stage of the execution pipeline in response to the instructions: determining whether there is a set of threads in the thread group for which the atomic operation for the threads accesses the same memory location; and, if such a set of threads is identified, performing the atomic operation for the set of threads by: providing to the second thread in the set of threads, the first thread's register value for the atomic operation; performing for the second thread in the set of threads the arithmetic operation for the atomic operation using the second thread's register value for the atomic operation and the first thread's register value for the atomic operation; and performing for each thread in the set of threads other than the first and second threads, if any, the arithmetic operation for the atomic operation using the thread's register value for the atomic operation and the result of the arithmetic operation for the preceding thread in the set of threads, to thereby generate for the final thread in the identified set of threads a combined result of the arithmetic operation for the set of threads; and then executing for the identified set of threads a single atomic memory operation to the memory location for the atomic operation for the set of threads using the combined result of the arithmetic operation for the set of threads as its register argument.
地址 Cambridge GB