摘要 |
A pipelined processing unit 50 produces an intermediate result 65 for use in an iterative approximation algorithm, for example, in an odd number of clock cycles, for example. The pipelined processing unit may execute SIMD requests by staggering commencement of execution of the requests from a SIMD instruction. When executing one or more operations for a SIMD iterative approximation algorithm, and an operation for another SIMD iterative approximation algorithm is ready to begin execution, for example, control logic causes intermediate results completed by a pipeline stage 57 to pass through a wait stage 58 before being used in a subsequent computation. This delay in the wait stage may provide two open scheduling cycles in which both parts of the next SIMD instruction can begin execution (Table 4 on page 12). Although the wait state increases latency to complete an in-progress algorithm, a total throughput of execution on the pipeline may increase for multi-pass algorithms such as the Newton-Raphson method. The pipeline may be dynamically varied in length, from an odd to an even number of stages for example, to facilitate optimum performance. |