摘要 |
In a distributed system that includes multiple machines, a scheduler attempts to schedule a task on a machine that is not currently overloaded with work. If a task is scheduled on a machine that does not yet have copies of the portions of the data set on which the task needs to operate, then that machine obtains copies of those portions from other machines that already have them. Whenever a "source" machine ships a copy of a portion to another "destination" machine in the distributed system, the destination machine persistently stores that copy on the destination machine's persistent storage mechanism. The copy also remains on the source machine. Thus, portions of the data set are automatically replicated whenever those portions are shipped between machines of the distributed system. Each machine in the distributed system has access to "global" information that indicates which machines have which portions of the data set. |