发明名称 DOCUMENT CATEGORIZING DEVICE, METHOD THEREOF AND PROGRAM
摘要 PROBLEM TO BE SOLVED: To provide a document categorizing device capable of performing document classification close to human sensitivity.SOLUTION: A document of a portion corresponding to a certain topic of an input document containing a plurality of topics is defined as a reference document, a document vector of the reference document is extracted, and inter-vector similarity is determined between the document vector and a topic vector that is a center of gravity of a document vector included in a topic class obtained by cluster-classifying topics determined from a sample document. A category sample document vector correspondence table makes a sample document vector determined from a sample document whose topics are classified for each category correspondent to a topic. Inter-vector similarity is determined for each topic class between the topic vector corresponding to a topic class of the high inter-vector similarity and a sample document vector in the category sample document vector correspondence table. A value obtained by accumulating values each resulting from multiplying the similarity by a degree of importance of the topic is determined as document similarity, and the reference document is classified to a category having the highest document similarity.
申请公布号 JP2013191194(A) 申请公布日期 2013.09.26
申请号 JP20120136868 申请日期 2012.06.18
申请人 NIPPON TELEGR & TELEPH CORP <NTT> 发明人 TAMOTO SHINJI;MASATAKI HIROKAZU;YOSHIOKA OSAMU;TAKAHASHI SATOSHI
分类号 G06F17/30 主分类号 G06F17/30
代理机构 代理人
主权项
地址