发明名称 REDUCING INSTRUCTION MISS PENALTIES IN APPLICATIONS
摘要 Embodiments include systems and methods for reducing instruction cache miss penalties during application execution. Application code is profiled to determine “hot” code regions likely to experience instruction cache miss penalties. The application code can be linearized into a set of traces that include the hot code regions. Embodiments traverse the traces in reverse, keeping track of instruction scheduling information, to determine where an accumulated instruction latency covered by the code blocks exceeds an amount of latency that can be covered by prefetching. Each time the accumulated latency exceeds the amount of latency that can be covered by prefetching, a prefetch instruction can be scheduled in the application code. Some embodiments insert additional prefetches, merge prefetches, and/or adjust placement of prefetches to account for scenarios, such as loops, merging or forking branches, edge confidence values, etc.
申请公布号 US2014195788(A1) 申请公布日期 2014.07.10
申请号 US201313738811 申请日期 2013.01.10
申请人 ORACLE INTERNATIONAL CORPORATION 发明人 KALOGEROPULOS Spiros;TIRUMALAI Partha
分类号 G06F9/38 主分类号 G06F9/38
代理机构 代理人
主权项 1. A system for reducing instruction cache miss penalties in application code execution, the system comprising: a computer-implemented code profiler, operable to: determine an instruction cache miss penalty for each of a plurality of code sections of application code, the instruction cache miss penalty indicating a likelihood that execution of the corresponding code section in the target execution environment will result in an instruction cache miss; andgenerate execution traces from the application code, each execution trace comprising at least one of the plurality of code sections; and a computer-implemented prefetcher, in communication with the computer-implemented code profiler, and operable, for each execution trace having a code section with a corresponding instruction cache miss penalty that exceeds a predetermined penalty threshold, to: traverse a set of code blocks of the execution trace in reverse starting from a source code block of the execution trace until an accumulated instruction latency exceeds a prefetch latency by, for each of the set of code blocks, adding a latency covered by the code block to latencies of previously traversed code blocks of the set of code blocks to calculate the accumulated instruction latency, the prefetch latency corresponding to a predicted time to prefetch into an instruction cache a number of code blocks defined by a prefetch chunk size; andinsert a prefetch instruction ahead of a last-traversed code block in the execution trace when the accumulated instruction latency exceeds the prefetch latency.
地址 Redwood City CA US