发明名称 SYSTEM AND METHOD FOR AUTOMATIC DOCUMENT CLASSIFICATION IN EDISCOVERY, COMPLIANCE AND LEGACY INFORMATION CLEAN-UP
摘要 A system, method and computer program product for automatic document classification, including an extraction module configured to extract structural, syntactical and/or semantic information from a document and normalize the extracted information; a machine learning module configured to generate a model representation for automatic document classification based on feature vectors built from the normalized and extracted semantic information for supervised and/or unsupervised clustering or machine learning; and a classification module configured to select a non-classified document from a document collection, and via the extraction module extract normalized structural, syntactical and/or semantic information from the selected document, and generate via the machine learning module a model representation of the selected document based on feature vectors, and match the model representation of the selected document against the machine learning model representation to generate a document category, and/or classification for display to a user.
申请公布号 US2014156567(A1) 申请公布日期 2014.06.05
申请号 US201213693075 申请日期 2012.12.04
申请人 MSC INTELLECTUAL PROPERTIES B.V. 发明人 Scholtes Johannes Cornelis
分类号 G06N99/00 主分类号 G06N99/00
代理机构 代理人
主权项 1. A computer implemented system for automatic document classification, the system comprising: an extraction module configured to extract structural, syntactical and/or semantic information from a document and normalize the extracted information; a machine learning module configured to generate a model representation for automatic document classification based on feature vectors built from the normalized and extracted semantic information for supervised and/or unsupervised clustering or machine learning; and a classification module configured to select a non-classified document from a document collection, and via the extraction module extract normalized structural, syntactical and/or semantic information from the selected document, and generate via the machine learning module a model representation of the selected document based on feature vectors, and match the model representation of the selected document against the machine learning model representation to generate a document category, and/or classification for display to a user.
地址 Amsterdam NL