摘要 |
Index structures and query processing framework that enforces a given threshold on the overhead of computing conjunctive keyword queries. This includes a keyword processing algorithm, logic to determine which indexes to materialize, and a probabilistic approach to reducing the overhead for determining which indexes to build. The index structures leverage the fact that the frequency distribution of natural-language text follows a power law. Given a document collection, a set of indexes is proposed for materialization so that the time for intersecting keywords does not exceed a given threshold Delta. When considering the associated space requirement, the additional indexes are limited. Materialization of such a set of indexes for reasonable values of Delta (e.g., the time required to scan 20% of the largest inverted index), at least for a collection of short documents is distributed by the power law.
|