发明名称 Apparatus and method for performing fused multiply add floating point operation
摘要 A fused multiply add floating point unit 1 includes multiplying circuitry 4 and adding circuitry 8. The multiply circuitry 4 multiplies operands B and C having N-bit significands to generate an unrounded product B*C. The unrounded product B*C has an M-bit significand, where M>N. The adding circuitry 8 receives an operand A that is input at a later processing cycle than a processing cycle at which the multiplying circuitry 4 receives operands B and C. The adding circuitry 8 commences processing of the operand A after the unrounded product B*C is generated by the multiplying circuitry 4. The adding circuitry 8 adds the operand A to the unrounded product B*C and outputs a rounded result A+B*C.
申请公布号 US8990282(B2) 申请公布日期 2015.03.24
申请号 US200912585668 申请日期 2009.09.21
申请人 ARM Limited 发明人 Lutz David Raymond
分类号 G06F7/485;G06F7/487;G06F7/483;G06F7/544 主分类号 G06F7/485
代理机构 Nixon & Vanderhye P.C. 代理人 Nixon & Vanderhye P.C.
主权项 1. A data processing apparatus for performing a fused multiply add operation on operands A, B and C to generate a result A+B*C, said operands A, B and C and said result A+B*C being floating point values each having an N-bit significand, said data processing apparatus comprising: multiplying circuitry configured to multiply said operand B and said operand C to generate an unrounded product B*C having an M-bit significand, where M>N; adding circuitry configured to add said unrounded product B*C to said operand A and output a rounded result A+B*C having an N-bit significand; and control circuitry responsive to a fused multiply add instruction to control said multiplying circuitry and said adding circuitry to perform said fused multiply add operation in a plurality of processing cycles; wherein said adding circuitry comprises a first input for receiving, from a register or as a result of a preceding instruction, said operand A at a later processing cycle than a processing cycle at which said operands B and C are input to said multiplying circuitry; and said adding circuitry is controlled by said control circuitry to commence processing of said operand A after said multiplying circuitry has generated said unrounded product B*C, wherein said data processing apparatus is configured to obtain said operand A from the register or the result of the preceding instruction in a later processing cycle than the processing cycle in which said operands B and C are input to said multiplying circuitry, said control circuitry is responsive to a multiply instruction to control said multiplying circuitry to multiply said operands B and C; and said data apparatus is configured to execute said multiply instruction in fewer processing cycles than said fused multiply add instruction.
地址 Cambridge GB