发明名称 DOCUMENT CLASSIFICATION DEVICE, DOCUMENT CLASSIFICATION METHOD AND DOCUMENT CLASSIFICATION PROGRAM
摘要 PROBLEM TO BE SOLVED: To provide a sorter excellent in classification performance of unlabelled documents into unknown words and known words, by estimating class labels of unlabelled documents composed only of the unknown words.SOLUTION: A word pair generation unit 4 of a document sorter 1 extracts word pairs from external documents to generate word pair data. A characterization addition unit 20 adds document feature representation to labelled documents (characterized labelled documents) and adds document feature representation to unlabelled documents (characterized unlabelled documents). A pseudo characterization addition unit 50 divides the characterized labelled documents into words, and determines word pseudo feature representation which forms a pair with the words divided in the word pair data. The pseudo feature representation is added to the characterized labelled documents to form pseudo characterized labelled documents. An estimation unit 110 estimates a label of the characterized unlabeled document on the basis of a class classification model generated on the basis of studies with the pseudo characterized unlabeled documents.
申请公布号 JP2015176511(A) 申请公布日期 2015.10.05
申请号 JP20140054258 申请日期 2014.03.18
申请人 NIPPON TELEGR & TELEPH CORP <NTT> 发明人 TOKUNAGA YOKO;KIYOTAKE HIROSHI;KAZUHARA YOSHIHIKO;TODA HIROYUKI;WASHISAKI SEIJI
分类号 G06F17/30 主分类号 G06F17/30
代理机构 代理人
主权项
地址