摘要 |
The present invention extends to methods, systems, and computer program products for partitioning streaming data. Embodiments of the invention can be used to hash partition a stream of data and thus avoids unnecessary memory usage (e.g., associated with buffering). Hash partitioning can be used to split an input sequence (e.g., a data stream) into multiple partitions that can be processed independently. Other embodiments of the invention can be used to hash repartition a plurality of streams of data. Hash repartitioning converts a set of partitions into another set of partitions with the hash partitioned property. Partitioning and repartitioning can be done in a streaming manner at runtime by exchanging values between worker threads responsible for different partitions.
|