发明名称 Semantic similarity based document retrieval
摘要 A method and apparatus are provided for generating, from an input set of documents, a word replaceability matrix defining semantic similarity between words occurring in the input document set. For each word, distinct word sequences of predetermined length are identified from the documents of the set, each word sequence being indicative of the context in which the word was used and, according to the relative frequency of occurrence of the identified word sequences for the word, fuzzy sets are generated for each word comprising membership values for corresponding groups of word sequences. For each pair of words occurring in the document set, their respective fuzzy sets are used to calculate the probability that the first word of a pair is semantically suitable as a replacement for the second word of the pair, these probabilities being collated to form a word similarity matrix for use in an improved method of determining document similarity and in information retrieval.
申请公布号 US7644047(B2) 申请公布日期 2010.01.05
申请号 US20060573152 申请日期 2006.03.23
申请人 BRITISH TELECOMMUNICATIONS PUBLIC LIMITED COMPANY 发明人 ASSADIAN BEHRAD;AZVINE BEHNAM;MARTIN TREVOR P
分类号 G06F15/18;G06F7/00;G06F17/27;G06F17/30 主分类号 G06F15/18
代理机构 代理人
主权项
地址