发明名称 INFORMATION MANAGEMENT AND RETRIEVAL
摘要 <p>A method and apparatus is provided for extracting key terms from a data set, the method including the steps of identifying a first set of one or more word groups of one or more words that occur more than once in the data set, and removing from this first set a second set of word groups that are sub-strings of longer word groups in the first set. The remaining word groups are key terms. Each word group is weighted according to its frequency of occurrence within the data set. The weighting of any word group may be increased by the frequency of any sub-string of words occurring in the second set and then dividing each weighting by the number of words in the word group. This weighting process operates to determine the order of occurrence of the word groups. Prefixes and suffixes are also removed from each word in the data set. This produces a neutral form of each word so that the weighting values are prefix and suffix independent.</p>
申请公布号 WO1999027469(A1) 申请公布日期 1999.06.03
申请号 GB1998003468 申请日期 1998.11.18
申请人 发明人
分类号 主分类号
代理机构 代理人
主权项
地址