摘要 |
An index for large databases is disclosed. Data is grouped into clusters and the clusters are grouped into levels of detail. Analysis results are determined based on progressive data sampling. Sampling is conducted based on the level of detail required and/or the resources (time or computing resources) that are available. Larger, more concentrated clusters, at higher levels of detail, are sampled more sparsely. Smaller, more diffuse clusters, at lower levels of detail, are sampled more intensively. Analysis results, including outlier data, include proportional representation from the whole database up to the level of detail required. Results are quickly determined with specified degree of accuracy, based on initial sampling, and are refined with subsequent sampling. |