发明名称 DOCUMENT CLASSIFICATION METHOD, DOCUMENT CLASSIFICATION PROGRAM AND DOCUMENT CLASSIFICATION DEVICE
摘要 PROBLEM TO BE SOLVED: To provide a document classification method for analyzing a Japanese text document such as a news article or magazine article, and for highly accurately classifying the document into a pertinent category.SOLUTION: A document classification method includes: extracting a feature word to be used for document classification by the units of a modified structure and a meaning structure from a relation between the surface case and predicate of a sentence. As the significance of the feature word, weighting predicting a likelihood category is used. In the weighting, a feature word which appears biased to a specific category is defined as a feature word with the high degree of dependence on a category into which a document should be classified, and a category which appears with the highest frequency with respect to the feature word is estimated as the likelihood category, and the calculation result of a statistical indicator corresponding to the pertinent category is used as the significance of the feature word. The similarities of a document to be classified and preliminarily learned document data are compared by the units of the modified structure and the meaning structure. The comparison of the similarities of the feature words is performed by the comparison of mutual superordinate concepts in a latent meaning space.
申请公布号 JP2014056331(A) 申请公布日期 2014.03.27
申请号 JP20120199662 申请日期 2012.09.11
申请人 HITACHI ADVANCED SYSTEMS CORP 发明人 EZAWA KENJI;KAKO ICHIRO;ABE ATSUSHI
分类号 G06F17/30;G06F17/21 主分类号 G06F17/30
代理机构 代理人
主权项
地址