摘要 |
A system, method, and apparatus are provided for supporting and/or executing count-distinct queries. A large set of data (e.g., tens or hundreds of millions of event records) is condensed daily to generate presence bitmaps to reflect the distinctiveness of a selected data dimension S (e.g., user ID) for one or more key dimensions g1, g2, . . . (e.g., advertisement ID, campaign ID, advertiser ID). The condensation process eliminates duplication and yields a single value (e.g., 1 or 0) for each tuple [S, g1, . . . ] to represent the distinctiveness of each value in the S dimension to each combination of values in the grouping dimensions. On a monthly basis, the daily values are condensed to yield a single value for the month, and a similar process is applied on any other desired time granularities (e.g., year). The condensed data may be generated for any combination of selected dimension(s) and grouping dimension(s). |
主权项 |
1. A method of determining distinctiveness within multi-dimensional data condensed in a selected dimension, the method comprising:
receiving a query regarding distinctiveness of the multi-dimensional data within a specified range of time, across: the selected dimension; and one or more dimensions other than the selected dimension; for each unique key comprising a value in the selected dimension and values in the one or more other dimensions, accessing, with a computer, at least one associated presence bitmap comprising separate indicators corresponding to each of multiple time periods; and aggregating a count of unique keys for which at least one associated presence bitmap comprises an indicator having a first value and corresponding to a time period within the specified range of time. |