发明名称 OPTIMIZE CONTROL-FLOW CONVERGENCE ON SIMD ENGINE USING DIVERGENCE DEPTH
摘要 There are provided a system, a method and a computer program product for selecting an active data stream (a lane) while running SPMD (Single Program Multiple Data) code on SIMD (Single Instruction Multiple Data) machine. The machine runs an instruction stream over input data streams. The machine increments lane depth counters of all active lanes upon the thread-PC reaching a branch operation. The machine updates the lane-PC of each active lane according to targets of the branch operation. The machine selects an active lane and activates only lanes whose lane-PCs match the thread-PC. The machine decrements the lane depth counters of the selected active lanes and updates the lane-PC of each active lane upon the instruction stream reaching a first instruction. The machine assigns the lane-PC of a lane with a largest lane depth counter value to the thread-PC and activates all lanes whose lane-PCs match the thread-PC.
申请公布号 US2016062771(A1) 申请公布日期 2016.03.03
申请号 US201414468904 申请日期 2014.08.26
申请人 International Business Machines Corporation 发明人 Almasi Gheorghe;Moreira Jose;Tseng Jessica H.;Wu Peng
分类号 G06F9/38;G06F9/45 主分类号 G06F9/38
代理机构 代理人
主权项 1. A method for selecting an active data stream while running a SPMD (Single Program Multiple Data) program of instructions on a SIMD (Single Instruction Multiple Data) machine, an instruction stream having one thread-PC (Program Counter), the thread-PC indicating an instruction memory address which stores an instruction to be fetched next for the instruction stream; running the instruction stream over one or more input data streams (“lanes”), each lane being associated with a corresponding lane depth counter, a corresponding lane-PC of a lane indicating a memory address which stores the instruction to be fetched next for the lane when the lane is activated, and a lane activation bit indicating whether a corresponding lane is active or not; incrementing lane depth counters of all active lanes upon the thread-PC reaching a branch operation in the instruction stream; updating the lane-PC of each active lane according to targets of the branch operation; and selecting one or more active lanes and assigning a corresponding lane-PC to the thread-PC, and activating only lanes whose lane-PC matches the thread-PC; decrementing the lane depth counters of the selected active lanes and updating the lane-PC of each active lane upon the instruction stream reaching a first instruction; and assigning the lane-PC of a lane with a largest lane depth counter value to the thread-PC and activating all lanes whose lane-PCs match the thread-PC, wherein a plurality of processors coupled to one or more memory devices perform the running, the incrementing, the assigning, the activating and the decrementing until the thread-PC reaches an end of the instruction stream and the lane-PCs of all lanes match with the thread-PC.
地址 Armonk NY US