发明名称 METHOD AND DEVICE FOR CALCULATING DEGREE OF SIMILARITY BETWEEN FILES PERTAINING TO DIFFERENT FIELDS
摘要 A method and device for calculating the degree of similarity between files pertaining to different fields. The method comprises: storing files from different fields and storing relationships between any two files from different fields (S101); tokenizing and removing stop words in the files from different fields, so as to obtain a word data set for the files from different fields (S102); constructing a correlation matrix for the files from different fields on the basis of the relationships between any two files from different fields (S103); obtaining a topic cluster for the files from different fields on the basis of the word data set (S104); obtaining, on the basis of the correlation matrix and the topic cluster, the probability that any one topic from the cluster will occur in any one file and the weight for any one topic match from any two different fields (S105); calculating the degree of similarity between any two files on the basis of the probability that any one topic of the topic cluster will occur in any two files from different fields, and on the basis of the weight for said any one topic match from any two fields to which any two files respectively belong (S106). The method and device may improve the accuracy of degree of similarity between files pertaining to different fields.
申请公布号 WO2015096468(A1) 申请公布日期 2015.07.02
申请号 WO2014CN82526 申请日期 2014.07.18
申请人 HUAWEI TECHNOLOGIES CO., LTD. 发明人 WANG, LIANGWEI;LEUNG, WINGKI;YANG, YANG
分类号 G06F17/30 主分类号 G06F17/30
代理机构 代理人
主权项
地址