摘要 |
In a massively parallel computer system embodiment, when receiving a message at a compute node from an input/output node, the compute node performs the steps of: obtaining a lock on a collective network device; checking a shared storage location for a message pending for a thread; if such a message is pending, receiving the message's remaining packets directly to a user's buffer, unlocking, and returning; if no such message is pending, receiving one packet from the network device; if the packet indicates that the message is for the thread, receiving the message's remaining packets directly to the user's buffer, unlocking, and returning; and if the packet indicates that the message is for another thread, updating the shared storage location with a thread id of the other thread, unlocking, waiting for a time out, locking, and repeating from the checking step. Accordingly, data copying is eliminated with an attendant performance benefit. |