发明名称 Multithreaded processor with multiple concurrent pipelines per thread
摘要 A multithreaded processor comprises a plurality of hardware thread units, an instruction decoder coupled to the thread units for decoding instructions received therefrom, and a plurality of execution units for executing the decoded instructions. The multithreaded processor is configured for controlling an instruction issuance sequence for threads associated with respective ones of the hardware thread units. On a given processor clock cycle, only a designated one of the threads is permitted to issue one or more instructions, but the designated thread that is permitted to issue instructions varies over a plurality of clock cycles in accordance with the instruction issuance sequence. The instructions are pipelined in a manner which permits at least a given one of the threads to support multiple concurrent instruction pipelines.
申请公布号 US8762688(B2) 申请公布日期 2014.06.24
申请号 US201113282800 申请日期 2011.10.27
申请人 QUALCOMM Incorporated 发明人 Hokenek Erdem;Moudgill Mayan;Schulte Michael J.;Glossner C. John
分类号 G06F9/00 主分类号 G06F9/00
代理机构 Knobbe Martens Olson & Bear LLP 代理人 Knobbe Martens Olson & Bear LLP
主权项 1. A multithreaded processor comprising: means for permitting a thread to issue one or more instructions on a processor clock cycle; means for varying the thread permitted to issue instructions over a plurality of clock cycles in accordance with an instruction issuance sequence; and means for pipelining the instructions to permit the threads to support multiple concurrent instruction pipelines, wherein the pipelined instructions comprise at least a vector multiplication and reduction instruction that includes an instruction decode stage, a vector register file read stage, at least two multiply stages, at least two add stages, an accumulator read stage, a plurality of reduction stages, and an accumulator writeback stage; wherein the vector multiplication and reduction instruction is pipelined using a number of stages which is greater than a total number of threads of the processor; and wherein vector multiplication and reduction instruction pipelines are shifted relative to one another to permit computation cycles which are longer than issue cycles without forwarding logic to allow lengthening of execution phases without causing bubbles in the pipelines.
地址 San Diego CA US