发明名称 UNSTRUCTURED DOCUMENT CLASSIFICATION
摘要 A document classification method comprises: (i) classifying pages of an input document to generate page classifications; (ii) aggregating the page classifications to generate an input document representation, the aggregating not being based on ordering of the pages; and (iii) classifying the input document based on the input document representation. A page classifier for use in the page classifying operation (i) is trained based on pages of a set of labeled training documents having document classification labels. In some such embodiments, the pages of the set of labeled training documents are not labeled, and the page classifier training comprises: clustering pages of the set of labeled training documents to generate page clusters; and generating the page classifier based on the page clusters.
申请公布号 US2011137898(A1) 申请公布日期 2011.06.09
申请号 US20090632135 申请日期 2009.12.07
申请人 XEROX CORPORATION 发明人 GORDO ALBERT;PERRONNIN FLORENT;RAGNET FRANCOIS
分类号 G06F17/30 主分类号 G06F17/30
代理机构 代理人
主权项
地址