摘要 |
A search system for information retrieval comprises a data structure in the form of a non-evenly spaced sparse suffix tree for storing suffixes of words and/or symbols, or sequences thereof, in a text T, a metric M comprising combining edit distance metrics for an approximate degree of matching respectively between words and/or symbols, or between sequences thereof, in the text T and a query Q, the latter distance metric including weighting cost functions for edit operations which transform a sequence S of the text into a sequence P of the query Q, and search algorithms for determining the degree of matching respectively between words and/or symbols, or between sequences thereof, in respectively the text T and the query Q, such that information R is retrieved with a specified degree of matching with the query Q. Optionally the search system also comprises algorithms for determining exact matching such that information R may be retrieved with an exact degree of matching with the query Q. A method in the search system comprises generating the data structure as a word-spaced sparse suffix tree, storing sequence information of the words in the text T in the generated suffix tree, generating a combined edit distant metric for words or sequences thereof in the text T and a query word q or sequences thereof in the query Q and including word-weighting cost functions for the sequence-transforming edit operations, and determining the degree of matching between retrieved information R and a query Q. - Use in an approximate search engine.
|