发明名称 Method for estimating the number of distinct values in a partitioned dataset
摘要 The task of estimating the number of distinct values (DVs) in a large dataset arises in a wide variety of settings in computer science and elsewhere. The present invention provides synopses for DV estimation in the setting of a partitioned dataset, as well as corresponding DV estimators that exploit these synopses. Whenever an output compound data partition is created via a multiset operation on a pair of (possibly compound) input partitions, the synopsis for the output partition can be obtained by combining the synopses of the input partitions. If the input partitions are compound partitions, it is not necessary to access the synopses for all the base partitions that were used to construct the input partitions. Superior (in certain cases near-optimal) accuracy in DV estimates is maintained, especially when the synopsis size is small. The synopses can be created in parallel, and can also handle deletions of individual partition elements.
申请公布号 US7987177(B2) 申请公布日期 2011.07.26
申请号 US20080022601 申请日期 2008.01.30
申请人 INTERNATIONAL BUSINESS MACHINES CORPORATION 发明人 BEYER KEVIN SCOTT;GEMULLA RAINER;HAAS PETER JAY;REINWALD BERTHOLD;SISMANIS JOHN
分类号 G06F17/00;G06F17/30 主分类号 G06F17/00
代理机构 代理人
主权项
地址