发明名称 Techniques for estimating item frequencies in large data sets
摘要 Techniques for estimating items (e.g., data item or objects) frequencies in large data sets are disclosed. For example, a technique for determining items and their frequencies at multiple levels of interest in a collection of nested bags includes the following steps. A hierarchy of a plurality of levels of nested bags and the levels of interest are inputted. Among the plurality of levels, a subset of bags is sampled from at least one level. At each level of interest, the frequency is counted of each distinct item in the bags obtained in the sampling step. At each level of interest, the item frequencies obtained in the counting step are extrapolated based on sampling ratios associated with the sampling step. At each level of interest, the items are sorted according to their frequencies obtained from the extrapolating step and those items with highest frequencies are retained. A bag may refer to one or more subsets or groups of data items or objects. Also, a bag may, itself, contain one or more other bags.
申请公布号 US8489645(B2) 申请公布日期 2013.07.16
申请号 US20040950800 申请日期 2004.09.27
申请人 MIHAILA GEORGE ANDREI;WANG MIN;INTERNATIONAL BUSINESS MACHINES CORPORATION 发明人 MIHAILA GEORGE ANDREI;WANG MIN
分类号 G06F7/00 主分类号 G06F7/00
代理机构 代理人
主权项
地址
您可能感兴趣的专利