Methods and apparatus for parallel processing are provided. A multicore processor is described. The multicore processor may include a distributed memory unit with memory nodes coupled to the processor's cores. The cores may be configured to execute parallel threads, and at least one of the threads may be data-dependent on at least one of the other threads. The distributed memory unit may be configured to proactively send shared memory data from a thread that produces the shared memory data to one or more of the threads.