发明名称 GENERATING HISTOGRAMS OF POPULATION DATA BY SCALING FROM SAMPLE DATA
摘要 Histograms formed based on samples of a population, such as histograms created from random page-level samples of a data store, are intelligently scaled to histograms estimating distribution of the entire population of the data store. As an optional optimization, where a threshold number of duplicate samples are observed during page-level sampling, the number of distinct values in the overall population data is presumed to be the number of distinct values in the sample data. Also, during estimation of distinct values of an overall population, a "Chao" estimator can optionally be utilized as a lower bound of the estimate. The resulting estimate is then used when scaling, which can take domain knowledge of the data being scaled into account in order to prevent scaled estimates from exceeding the limits of the domain Also, a "sum of the parts" mathematical relationship can be taken into account during scaling that the sum of the scaled distinct values for each bin of an estimate histogram should total an estimate for the total distinct values of the entire population.
申请公布号 US2008059125(A1) 申请公布日期 2008.03.06
申请号 US20060469855 申请日期 2006.09.02
申请人 MICROSOFT CORPORATION 发明人 FRASER CAMPBELL BRYCE;JOSE IAN;ZABBACK PETER ALFRED
分类号 G06F19/00;G06F17/18;G06F17/40 主分类号 G06F19/00
代理机构 代理人
主权项
地址