发明名称 |
DOCUMENT CATEGORIZING DEVICE, METHOD THEREOF AND PROGRAM |
摘要 |
PROBLEM TO BE SOLVED: To provide a document categorizing device capable of performing document classification close to human sensitivity.SOLUTION: A document of a portion corresponding to a certain topic of an input document containing a plurality of topics is defined as a reference document, a document vector of the reference document is extracted, and inter-vector similarity is determined between the document vector and a topic vector that is a center of gravity of a document vector included in a topic class obtained by cluster-classifying topics determined from a sample document. A category sample document vector correspondence table makes a sample document vector determined from a sample document whose topics are classified for each category correspondent to a topic. Inter-vector similarity is determined for each topic class between the topic vector corresponding to a topic class of the high inter-vector similarity and a sample document vector in the category sample document vector correspondence table. A value obtained by accumulating values each resulting from multiplying the similarity by a degree of importance of the topic is determined as document similarity, and the reference document is classified to a category having the highest document similarity. |
申请公布号 |
JP2013191194(A) |
申请公布日期 |
2013.09.26 |
申请号 |
JP20120136868 |
申请日期 |
2012.06.18 |
申请人 |
NIPPON TELEGR & TELEPH CORP <NTT> |
发明人 |
TAMOTO SHINJI;MASATAKI HIROKAZU;YOSHIOKA OSAMU;TAKAHASHI SATOSHI |
分类号 |
G06F17/30 |
主分类号 |
G06F17/30 |
代理机构 |
|
代理人 |
|
主权项 |
|
地址 |
|