摘要 |
A system and method is disclosed that reduces intrabank conflicts and ensures maximum bandwidth on accesses to strided vectors in a bank-interleaved cache memory. The computer system contains a processor including a vector execution unit, scalar processor unit, cache controller and bank-interleaved cache memory. The vector execution unit retrieves strided vectors of data and instructions stored in the bank-interleaved cache memory in a plurality of cache banks such that intrabank conflicts are reduced. Given a stride S of a vector, the strided vectors of data and instructions stored in the bank-interleaved cache memory are retrieved by determining R and T using the equation S=2T*R. If T<=W, W defining a cache bank 2W words wide, then, for 0<=i<2(W-T), 0<=j<2P, and 0<=k<2N, words addressed i+2(W-T+N)j+2(W-T)k are accessed on the same cycle. P defines the bank-interleaved cache memory to contain 2P sets and N defines 2N cache banks in one set of the bank-interleaved cache memory. If W<T<N, then for 0<=j<2P and 0<=k<2(N-T), the words addressed 2(N-T)j+k are accessed on the same cycle. Finally, if T>=N, then the vector words are accessed sequentially at different cycles.
|