摘要 |
Two techniques address bottlenecking in processors. The first is indirect prefetching. The technique can be especially useful for graph analytics and sparse matrix applications. For graph analytics and sparse matrix applications, the addresses of most random memory accesses come from an index array B which is sequentially scanned by an application. The random accesses are actually indirect accesses in the form A[B[i]]. A hardware component is introduced to detect this pattern. The hardware can then read B a certain distance ahead, and prefetch the corresponding element in A. For example, if the “prefetch distance” is k, when B[i] is accessed, the hardware reads B[i+k], and then A[B[i+k]. For partial cacheline accessing, the indirect accesses are usually accessing random memory locations and only accessing a small portion of a cacheline. Instead of loading the whole cacheline into L1 cache, the second technique only loads a part of the cacheline. |