摘要 |
A data record transformation that computes histograms and aggregations quickly for an incoming record stream. The data record transformation computes histograms and aggregations in one-step, thereby, avoiding the creation of a large intermediate result. The data record transformation operates in a streaming fashion on each record in an incoming record stream. Little memory is required to operate on one record or a few records at a time. According to a first embodiment, a method, system, and computer program product for transforming sorted data records is provided. A data transformation unit includes a binning module and a histogram aggregation module. The histogram aggregation module processes each binned and sorted record to form an aggregate record in a histogram format in one step. Data received in each incoming binned and sorted record is expanded and accumulated in an aggregate record for matching group-by fields. According to a second embodiment, a method, system, and computer program product for transforming unsorted data records is provided. An associative data structure holds a collection of partially aggregated histogram records. A histogram aggregation module processes each binned record to form an aggregate record in a histogram format in one step. Input records from the unordered record stream are matched against the collection of partially aggregated histogram records and expanded and accumulated into the aggregate histogram record having matching group-by fields.
|