发明名称 |
SYSTEM AND METHOD FOR AUTOMATIC DOCUMENT CLASSIFICATION IN EDISCOVERY, COMPLIANCE AND LEGACY INFORMATION CLEAN-UP |
摘要 |
A system, method and computer program product for automatic document classification, including an extraction module configured to extract structural, syntactical and/or semantic information from a document and normalize the extracted information; a machine learning module configured to generate a model representation for automatic document classification based on feature vectors built from the normalized and extracted semantic information for supervised and/or unsupervised clustering or machine learning; and a classification module configured to select a non-classified document from a document collection, and via the extraction module extract normalized structural, syntactical and/or semantic information from the selected document, and generate via the machine learning module a model representation of the selected document based on feature vectors, and match the model representation of the selected document against the machine learning model representation to generate a document category, and/or classification for display to a user. |
申请公布号 |
US2014156567(A1) |
申请公布日期 |
2014.06.05 |
申请号 |
US201213693075 |
申请日期 |
2012.12.04 |
申请人 |
MSC INTELLECTUAL PROPERTIES B.V. |
发明人 |
Scholtes Johannes Cornelis |
分类号 |
G06N99/00 |
主分类号 |
G06N99/00 |
代理机构 |
|
代理人 |
|
主权项 |
1. A computer implemented system for automatic document classification, the system comprising:
an extraction module configured to extract structural, syntactical and/or semantic information from a document and normalize the extracted information; a machine learning module configured to generate a model representation for automatic document classification based on feature vectors built from the normalized and extracted semantic information for supervised and/or unsupervised clustering or machine learning; and a classification module configured to select a non-classified document from a document collection, and via the extraction module extract normalized structural, syntactical and/or semantic information from the selected document, and generate via the machine learning module a model representation of the selected document based on feature vectors, and match the model representation of the selected document against the machine learning model representation to generate a document category, and/or classification for display to a user.
|
地址 |
Amsterdam NL |