发明名称 Method of performing approximate substring indexing
摘要 Approximate substring indexing is accomplished by decomposing each string in a database into overlapping "positional q-grams", sequences of a predetermined length q, and containing information regarding the "position" of each q-gram within the string (i.e., 1<SUP>st </SUP>q-gram, 4<SUP>th </SUP>q-gram, etc.). An index is then formed of the tuples of the positional q-gram data (such as, for example, a B-tree index or a hash index). Each query applied to the database is similarly parsed into a plurality of positional q-grams (of the same length), and a candidate set of matches is found. Position-directed filtering is used to remove the candidates which have the q-grams in the wrong order and/or too far apart to form a "verified" output of matching candidates. If errors are permitted (defined in terms of an edit distance between each candidate and the query), an edit distance calculation can then be performed to produce the final set of matching strings.
申请公布号 US7010522(B1) 申请公布日期 2006.03.07
申请号 US20020174218 申请日期 2002.06.17
申请人 AT&T CORP. 发明人 JAGADISH HOSAGRAHAR VISVESVARAYA;KOUDAS NIKOLAOS;MUTHUKRISHNAN SHANMUGAVELAYUTHAM;SRIVASTAVA DIVESH
分类号 G06F17/30 主分类号 G06F17/30
代理机构 代理人
主权项
地址