摘要 |
<p>A matrix multiplication method that is particularly well suited for use with a computer having hierarchical memory. The A and B term matrices are broken down into blocks (120, 122, 152, 154, 124, 126, 156, 158) and a sum of a series of outer products is computed in order to generate product matrix blocks. Reads to cache or other faster, high-level storage and writes to main memory are interleaved with the calculations in order to reduce or eliminate processor stalling. Individual blocks may be computed by separate processors (P1, P2) without requiring communication of intermediate results.</p> |