发明名称 ENABLING AND PERFORMING COUNT-DISTINCT QUERIES ON A LARGE SET OF DATA
摘要 A system, method, and apparatus are provided for supporting and/or executing count-distinct queries. A large set of data (e.g., tens or hundreds of millions of event records) is condensed daily to generate presence bitmaps to reflect the distinctiveness of a selected data dimension S (e.g., user ID) for one or more key dimensions g1, g2, . . . (e.g., advertisement ID, campaign ID, advertiser ID). The condensation process eliminates duplication and yields a single value (e.g., 1 or 0) for each tuple [S, g1, . . . ] to represent the distinctiveness of each value in the S dimension to each combination of values in the grouping dimensions. On a monthly basis, the daily values are condensed to yield a single value for the month, and a similar process is applied on any other desired time granularities (e.g., year). The condensed data may be generated for any combination of selected dimension(s) and grouping dimension(s).
申请公布号 US2015161186(A1) 申请公布日期 2015.06.11
申请号 US201414284121 申请日期 2014.05.21
申请人 LinkedIn Corporation 发明人 Vemuri Srinivas S.
分类号 G06F17/30 主分类号 G06F17/30
代理机构 代理人
主权项 1. A method of determining distinctiveness within multi-dimensional data condensed in a selected dimension, the method comprising: receiving a query regarding distinctiveness of the multi-dimensional data within a specified range of time, across: the selected dimension; and one or more dimensions other than the selected dimension; for each unique key comprising a value in the selected dimension and values in the one or more other dimensions, accessing, with a computer, at least one associated presence bitmap comprising separate indicators corresponding to each of multiple time periods; and aggregating a count of unique keys for which at least one associated presence bitmap comprises an indicator having a first value and corresponding to a time period within the specified range of time.
地址 Mountain View CA US