发明名称 Flexible, efficient and scalable sampling
摘要 A sampling infrastructure/scheme that supports flexible, efficient, scalable and uniform sampling is disclosed. A sample is maintained in a compact histogram form while the sample footprint stays below a specified upper bound. If, at any point, the sample footprint exceeds the upper bound, then the compact representation is abandoned, the sample purged to obtain a subsample. The histogram of the purged subsample is expanded to a bag of values while sampling remaining data values of the partitioned subset. The expanded purged subsample is converted to a histogram and uniform random samples are yielded. The sampling scheme retains the bounded footprint property and to a partial degree the compact representation of the Concise Sampling scheme, while ensuring statistical uniformity. Samples from at least two partitioned subsets are merged on demand to yield uniform merged samples of combined partitions wherein the merged samples also maintain the histogram representation and bounded footprint property.
申请公布号 US7543006(B2) 申请公布日期 2009.06.02
申请号 US20060469231 申请日期 2006.08.31
申请人 INTERNATIONAL BUSINESS MACHINES CORPORATION 发明人 BROWN PAUL GEOFFREY;HAAS PETER JAY
分类号 G06F17/30 主分类号 G06F17/30
代理机构 代理人
主权项
地址