摘要 |
<p>A processor and method that reduces instruction fetch penalty in the execution of a program sequence of instructions comprises a prefetch instruction that is inserted into the program at a location which precedes the instructions to be prefetched. The prefetch instruction is defined by an opcode that specifies a target field, a count field, a cache level field, a flush field, and a trace field. A block of target instructions, starting at the target address and continuing until the count is reached, is prefetched into the instruction cache of the processor so that the instructions are available for execution prior to execution of the instruction specified by the target address. The trace field specifies a vector of a path in the program sequence that leads from the prefetch instruction to the target address, and allows the prefetch operation to be aborted if the vector is not taken. The cache level field specifies the level of the cache memory into which the instructions are to be prefetched. Finally, the flush field indicates whether all preceding prefetch operations should be discarded. The present invention exposes the prefetch mechanism of the processor to the compiler, thereby increasing performance. By allowing the compiler to schedule appropriate prefetch instructions, the present invention reduces latency by increasing the likelihood that instructions will be in the cache when they are executed, while reducing cache pollution and conserving bandwidth by only prefetching instructions that are likely to be executed.</p> |