摘要 |
An approach for multidimensional substring selectivity estimation utilizes set hashing to generate cross-counts as needed, instead of storing cross-counts for the most frequently co-occurring substrings. Set hashing is a Monte Carlo technique that is used to succinctly represent the set of tuples containing a given substring. Then, any combination of set hashes will yield a cross-count when intersected. Thus, the set hashing technique is useful in three-, four- and other multidimensional situations, since only an intersection function is required.
|