摘要 |
The present invention solves a number of problems in using stems (canonical indicators of word meanings) in full-text retrieval of natural language documents, and thus permits recall to be improved without sacrificing precision. It uses various arrangements of finite-state transducers (FSTs) to accurately encode a number of desirable ways of mapping back and forth between words and stems, taking into account both systematic aspects of a language's morphological rule system and also the word-by-word irregularities that also occur. The merged FST (70) may be produced by simultaneously intersecting (&) and composing (o) a lexicon transducer (65) and a number of rule transducers (61-63). Although the resulting FSTs can have many states and transitions or arcs, they can be compacted by finite- state compression algorithms so that they can be used effectively in resource-limited applications. The invention contemplates the information retrieval system comprising the novel FST (70) as a database and a processor for responding to user queries, for searching the database, and for outputting proper responses, if they exist, as well as the novel database used in such a system and methods for constructing the novel database. <IMAGE> |