发明名称 DATABASE INDEX FOR CONSTRUCTING LARGE SCALE DATA LEVEL OF DETAILS
摘要 An index for large databases is disclosed. Data is grouped into clusters and the clusters are grouped into levels of detail. Analysis results are determined based on progressive data sampling. Sampling is conducted based on the level of detail required and/or the resources (time or computing resources) that are available. Larger, more concentrated clusters, at higher levels of detail, are sampled more sparsely. Smaller, more diffuse clusters, at lower levels of detail, are sampled more intensively. Analysis results, including outlier data, include proportional representation from the whole database up to the level of detail required. Results are quickly determined with specified degree of accuracy, based on initial sampling, and are refined with subsequent sampling.
申请公布号 US2016364421(A1) 申请公布日期 2016.12.15
申请号 US201514735246 申请日期 2015.06.10
申请人 International Business Machines Corporation 发明人 Huang Wei;Liu Jing Jing;Tao DaJiang;Wang Chen;Zhao Sheng;Zhou Zan
分类号 G06F17/30 主分类号 G06F17/30
代理机构 代理人
主权项 1. A method comprising: receiving a plurality of record identifiers with each record identifier uniquely identifying a record in a database; performing cluster analysis on the records corresponding to the plurality of record identifiers to yield a plurality of clusters, with each cluster including at least one record; and constructing a database index data structure where: (i) each record identifier is represented as a leaf node; (ii) each cluster is represented as a non-leaf node; and (iii) each leaf node is related to at least one non-leaf node based upon which record identifiers belong to which clusters.
地址 Armonk NY US