摘要 |
A repartitioning optimizer identifies alternative repartitioning strategies and selects optimal ones, accounting for network transfer utilization and partition sizes in addition to traditional metrics. If prior partitioning was hash-based, the repartitioning optimizer can determine whether a hash-based repartitioning can result in not every computing device providing data to every other computing device. If prior partitioning was range-based, the repartitioning optimizer can determine whether a range-based repartitioning can generate similarly sized output partitions while aligning input and output partition boundaries, increasing the number of computing devices that do not provide data to every other computing device. Individual computing devices, as they are performing a repartitioning, assign a repartitioning index to each individual data element, which represents the computing device to which such a data element is destined. The indexed data is sorted by such repartitioning indices, thereby grouping together all like data, and then stored in a sequential manner. |