发明名称 Predecode logic autovectorizing a group of scalar instructions including result summing add instruction to a vector instruction for execution in vector unit with dot product adder
摘要 A circuit arrangement, method, and program product for substituting a plurality of scalar instructions in an instruction stream with a functionally equivalent vector instruction for execution by a vector execution unit. Predecode logic is coupled to an instruction buffer which stores instructions in an instruction stream to be executed by the vector execution unit. The predecode logic analyzes the instructions passing through the instruction buffer to identify a plurality of scalar instructions that may be replaced by a vector instruction in the instruction stream. The predecode logic may generate the functionally equivalent vector instruction based on the plurality of scalar instructions, and the functionally equivalent vector instruction may be substituted into the instruction stream, such that the vector execution unit executes the vector instruction in lieu of the plurality of scalar instructions.
申请公布号 US8984260(B2) 申请公布日期 2015.03.17
申请号 US201113330888 申请日期 2011.12.20
申请人 International Business Machines Corporation 发明人 Muff Adam J.;Schardt Paul E.;Shearer Robert A.;Tubbs Matthew R.
分类号 G06F9/302;G06F17/16;G06F9/30;G06F9/455 主分类号 G06F9/302
代理机构 Wood, Herron & Evans, LLP 代理人 Wood, Herron & Evans, LLP
主权项 1. A method for executing scalar instructions in a processor including a vector execution unit including a plurality of processing lanes, each processing lane configured to perform an operation of a vector instruction to generate a processing lane result, and a dot product adder configured to sum the processing lane results to generate a vector instruction result, the method comprising: analyzing instructions stored in an instruction buffer associated with the processor utilizing a predecode logic coupled to the instruction buffer to identify a plurality of scalar instructions in an instruction stream for which a functionally equivalent vector instruction exists, wherein the plurality of scalar instructions includes a subset of scalar instructions generating independent results and a scalar add instruction for performing an add operation of the independent results; generating the functionally equivalent vector instruction based on the plurality of scalar instructions, wherein the vector instruction is configured to perform an operation based on each scalar instruction of the subset of scalar instructions and an add operation based on the scalar add instruction; substituting the functionally equivalent vector instruction for the plurality of scalar instructions in the instruction stream; and executing the functionally equivalent vector instruction in the vector execution unit to generate a vector instruction result by: performing each operation of the vector instruction corresponding each scalar instruction of the subset of scalar instructions with a processing lane of the vector execution unit to generate a plurality of processing lane results, andperforming the add operation of the vector instruction corresponding to the scalar add instruction with the dot product adder of the vector execution unit to generate the vector instruction result.
地址 Armonk NY US