发明名称 DIVISION-BASED HIGH-DIMENSIONAL SIMILARITY JOIN METHOD
摘要 PURPOSE: A division-based high-dimensional similarity join method is provided to divide a high-dimensional data space according to determined dimensions and the number of the dimensions, and to join data sets by the divided dimensions, then to dynamically predetermine the divided dimensions and the number of the dimensions, thereby efficiently measuring similarity. CONSTITUTION: A system divides an entire data space into cells showing limited similarity(S100). The system searches data pairs satisfying a specific similarity request from 2 data sets participating in a similarity join process(S400). If average size of each cell means one block of a disk, the system calculates the number of total divided cells(S200). The system calculates the number of cells obtained when dividing the space into predetermined dimensions(S210). The system derives the number of the divided dimensions(S220). The system calculates the number of data sets and entries included in each cell for each dimension(S300). The system obtains a distance calculation frequency generated while joining the divided cells by using the number of the entries(S310). The system designates a dimension having the lowest join cost as the final divided dimension(S320).
申请公布号 KR20040023388(A) 申请公布日期 2004.03.18
申请号 KR20020055101 申请日期 2002.09.11
申请人 SAMSUNG ELECTRONICS CO., LTD. 发明人 SHIN, HYO SEOP
分类号 G06F17/15;G06F7/00;G06F12/00;G06F15/16;G06F17/10;G06F17/30;G06T5/00;(IPC1-7):G06T5/00 主分类号 G06F17/15
代理机构 代理人
主权项
地址