发明名称 FINITE-STATE TRANSDUCTION OF RELATED WORD FORMS FOR TEXT INDEXING AND RETRIEVAL
摘要 The present invention solves a number of problems in using stems (canonical indicators of word meanings) in full-text retrieval of natural language documents, and thus permits recall to be improved without sacrificing precision. It uses various arrangements of finite-state transducers (FSTs) to accurately encode a number of desirable ways of mapping back and forth between words and stems, taking into account both systematic aspects of a language's morphological rule system and also the word-by-word irregularities that also occur. The merged FST (70) may be produced by simultaneously intersecting (&) and composing (o) a lexicon transducer (65) and a number of rule transducers (61-63). Although the resulting FSTs can have many states and transitions or arcs, they can be compacted by finite- state compression algorithms so that they can be used effectively in resource-limited applications. The invention contemplates the information retrieval system comprising the novel FST (70) as a database and a processor for responding to user queries, for searching the database, and for outputting proper responses, if they exist, as well as the novel database used in such a system and methods for constructing the novel database. <IMAGE>
申请公布号 EP0583083(A3) 申请公布日期 1994.09.07
申请号 EP19930305626 申请日期 1993.07.19
申请人 XEROX CORPORATION 发明人 CUTTING, DOUGLASS R.;HALVORSEN, PER-KRISTIAN G.;KAPLAN, RONALD M.;KARTTUNEN, LAURI;KAY, MARTIN;PEDERSEN, JAN O.
分类号 G06F17/30 主分类号 G06F17/30
代理机构 代理人
主权项
地址