SYSTEM AND METHOD FOR AUTOMATIC DOCUMENT CLASSIFICATION IN EDISCOVERY, COMPLIANCE AND LEGACY INFORMATION CLEAN-UP,申请号US201213693075-传众专利搜索

发明名称	SYSTEM AND METHOD FOR AUTOMATIC DOCUMENT CLASSIFICATION IN EDISCOVERY, COMPLIANCE AND LEGACY INFORMATION CLEAN-UP
摘要	A system, method and computer program product for automatic document classification, including an extraction module configured to extract structural, syntactical and/or semantic information from a document and normalize the extracted information; a machine learning module configured to generate a model representation for automatic document classification based on feature vectors built from the normalized and extracted semantic information for supervised and/or unsupervised clustering or machine learning; and a classification module configured to select a non-classified document from a document collection, and via the extraction module extract normalized structural, syntactical and/or semantic information from the selected document, and generate via the machine learning module a model representation of the selected document based on feature vectors, and match the model representation of the selected document against the machine learning model representation to generate a document category, and/or classification for display to a user.
申请公布号	US2014156567(A1)	申请公布日期	2014.06.05
申请号	US201213693075	申请日期	2012.12.04
申请人	MSC INTELLECTUAL PROPERTIES B.V.	发明人	Scholtes Johannes Cornelis
分类号	G06N99/00	主分类号	G06N99/00
代理机构		代理人
主权项	1. A computer implemented system for automatic document classification, the system comprising: an extraction module configured to extract structural, syntactical and/or semantic information from a document and normalize the extracted information; a machine learning module configured to generate a model representation for automatic document classification based on feature vectors built from the normalized and extracted semantic information for supervised and/or unsupervised clustering or machine learning; and a classification module configured to select a non-classified document from a document collection, and via the extraction module extract normalized structural, syntactical and/or semantic information from the selected document, and generate via the machine learning module a model representation of the selected document based on feature vectors, and match the model representation of the selected document against the machine learning model representation to generate a document category, and/or classification for display to a user.
地址	Amsterdam NL