摘要 |
Methods and apparatus provide for transferring blocks of data between a shared memory and one or more of a plurality of parallel processors, each processor including a local memory; executing one or more programs within the local memory of one or more of the processors, wherein the one or more programs are coded such that they do not rely on data caching within the processor; and buffering not more than about three instructions from any local memory in any instruction buffer of any processor, wherein the instruction buffer of each processor is adapted to process instructions with substantially maximal efficiency when the one or more programs are coded such that they do not rely on data caching within the processor. |