发明名称 System and methods for character string vector generation
摘要 The invention provides a similarity calculation device which is well suited to effectively calculate the similarities of words in such a way that the words are impartially reflected on the calculation of the similarities in correspondence with their frequencies of occurrences. The invention can include first, document vectors that are generated on the basis of a plurality of document data. Each of the document vectors can have elements corresponding to respective morphemes, and each of the elements can be calculated so as to become a value conforming to the frequency of occurrences of the corresponding morpheme. Subsequently, word vectors are generated using the transposed matrix of a document word matrix in which the generated document vectors are gathered. Accordingly, each of the word vectors has elements corresponding to the respective document data, and each of the elements is generated so as to become a value which is proportional to the frequency of occurrences of the morpheme in the corresponding one of the plurality of document data and which is inversely proportional to the frequency of occurrences of the morpheme in the plurality of document data. Thereafter, the similarity of a word can be calculated on the basis of the word vector.
申请公布号 US2003217066(A1) 申请公布日期 2003.11.20
申请号 US20030397163 申请日期 2003.03.27
申请人 SEIKO EPSON CORPORATION 发明人 KAYAHARA NAOKI
分类号 G06F17/30;(IPC1-7):G06F7/00 主分类号 G06F17/30
代理机构 代理人
主权项
地址