发明名称 INFORMATION MANAGEMENT AND KEY TERM RETRIEVAL
摘要 <p>A method and apparatus is provided for extracting key terms from a data set, the method includes identifying a first set of one or more word groups of one or more word that occur more than once in the data set, and removing from this first set a second set of word groups that are sub-strings of longer word groups in the first set. The remaining word groups are key terms. Each word group is weighted according to its frequency of occurrence within the data set. The weighting of any word group may be increased by the frequency of any sub-string of words occurring in the second set and then dividing each weighting by the number of words in the word group. This weighting process operates to determine the order of occurrence of the word groups. Prefixes and suffixes are also removed from each word in the data set. This produces a neutral form of each word so that the weighting values are prefix and suffix independent.</p>
申请公布号 EP1032896(B1) 申请公布日期 2002.03.27
申请号 EP19980954628 申请日期 1998.11.18
申请人 BRITISH TELECOMMUNICATIONS PUBLIC LIMITED COMPANY 发明人 WEEKS, RICHARD
分类号 G06F17/28;G06F17/30;(IPC1-7):G06F17/30 主分类号 G06F17/28
代理机构 代理人
主权项
地址