发明名称 Structure and method for efficient parallel high-dimensional similarity join
摘要 Multidimensional similarity join finds pairs of multi-dimensional points that are within some small distance of each other. Databases in domains such as multimedia and time-series can require a high number of dimensions. The epsilon -k-d-B tree has been proposed as a data structure that scales better as number of dimensions increases compared to previous data structures such as the R-tree (and variations), grid-file, and k-d-B tree. We present a cost model of the epsilon -k-d-B tree and use it to optimize the leaf size. This new leaf size is shown to be better in most situations compared to previous work that used a constant leaf size. We present novel parallel procedures for the epsilon -k-d-B tree. A load-balancing strategy based on equi-depth histograms is shown to work well for uniform or low-skew situations, whereas another based on weighted, equi-depth histograms works far better for high-skew datasets. The latter strategy is only slightly slower than the former strategy for low skew datasets. The weights for the latter strategy are based on the same cost model that is used to determine optimal leaf sizes.
申请公布号 US5987468(A) 申请公布日期 1999.11.16
申请号 US19970989847 申请日期 1997.12.12
申请人 HITACHI AMERICA LTD. 发明人 SINGH, VINEET;ALSABTI, KHALED;RANKA, SANJAY
分类号 G06F15/173;G06F12/00;G06F17/30;(IPC1-7):G06F17/30 主分类号 G06F15/173
代理机构 代理人
主权项
地址