摘要 |
A computing system has an amount of shared cache, and performs runtime automatic parallelization wherein when a parallelized loop is encountered, a main thread shares the workload with at least one other non-main thread. A method for providing interprocedural prefetching includes compiling source code to produce compiled code having a main thread including a parallelized loop. Prior to the parallelized loop in the main thread, the main thread includes prefetching instructions for the at least one other non-main thread that shares the workload of the parallelized loop. As a result, the main thread prefetches data into the shared cache for use by the at least one other non-main thread.
|