发明名称 METHOD AND DEVICE FOR CALCULATING DEGREE OF SIMILARITY BETWEEN FILES PERTAINING TO DIFFERENT FIELDS
摘要 The present invention discloses a method and an apparatus for computing a similarity between cross-field documents, where the method includes: storing documents of different fields, and a relationship between any two documents of different fields; performing word segmentation and stop word removal on the documents of different fields, to obtain a vocabulary data set for the documents of different fields; constructing an incidence matrix between the documents of different fields according to the relationship between the any two documents of different fields; obtaining a topic cluster of the documents of different fields according to the vocabulary data set; obtaining a probability that any topic in the topic cluster appears in any document and a matching weight of the any topic for any two different fields according to the incidence matrix and the topic cluster; and computing a similarity between the any two documents according to probabilities that the any topic in the topic cluster appears in the any two documents of different fields and the matching weight of the any topic for the fields to which the any two documents belong. In embodiments of the present invention, accuracy of a similarity between cross-field documents can be improved.
申请公布号 EP3065066(A4) 申请公布日期 2016.10.12
申请号 EP20140874314 申请日期 2014.07.18
申请人 HUAWEI TECHNOLOGIES CO., LTD. 发明人 WANG, LIANGWEI;LEUNG, WINGKI;YANG, YANG
分类号 G06F17/30 主分类号 G06F17/30
代理机构 代理人
主权项
地址