摘要 |
A method, apparatus, system, article of manufacture, and data structure provide the ability to perform a sorted map-reduce job on a cluster. A cluster of two or more computers is defined by installing a map-reduce framework onto each computer and formatting the cluster by identifying the cluster computers, establishing communication between them, and enabling the cluster to function as a unit. Data is placed into the cluster where it is distributed so that each computer contains a portion of the data. A first map function is performed where each computer sorts their respective data and creates an abstraction that is a representation of the data. The abstractions are exchanged and merged to create complete abstraction. A second map function searches the complete abstraction to redistribute and exchange the data across the computers in the cluster. A reduce function is performed in parallel to produce a result.
|