发明名称 AN AUTOMATIC DEVICE FOR TRAINING AND CLASSIFYING DOCUMENTS BASED ON N-GRAM STATISTICS AND AN AUTOMATIC METHOD FOR TRAINING AND CLASSIFYING DOCUMENTS BASED ON N-GRAM STATISTICS THEREFOR
摘要 The present invention relates to an apparatus for automatically learning documents and a method for automatically learning documents using the same, and an apparatus for automatically classifying documents and a method for automatically classifying documents using the same, which are capable of automatically learning and classifying mass documents on the web through a process of automatically learning and classifying documents based on n-gram. The apparatus for automatically classifying documents according to the present invention includes: a learning document pool including a plurality of learning document groups which are classified according to categories; a preprocessing unit configured to preprocess each of the learning document groups of the learning document pool; and an n-gram data set pool configured to store a set of n-gram data of the learning document pool, which is formed by being learned through the preprocessing of the preprocessing unit. Additionally, the apparatus for automatically classifying documents includes: an automatic document learning unit configured to allow the preprocessing unit to preprocess a corresponding new document to form a bigram set, when the new document occurs, which is not identified through the learning document pool; and an automatic document classifying unit configured to compare the bigram set of the new document, formed through the preprocessing unit, with a bigram set of the n-gram data set pool and to allocate and store the bigram set of the new document to one of n-gram data sets of the n-gram data set pool. [Reference numerals] (220) Automatic document classifying unit; (230) Learned n-gram data set(bigram example); (AA) Non-identified document; (BB) Appearance of a new document; (CC) Preprocessing
申请公布号 KR20140049659(A) 申请公布日期 2014.04.28
申请号 KR20120115730 申请日期 2012.10.18
申请人 INDUSTRY-ACADEMIC COOPERATION FOUNDATION, CHOSUN UNIVERSITY 发明人 KIM, PAN KOO;CHOI, DONG JIN;KIM, JEONG IN;KO, MI AH
分类号 G06F17/21;G06F17/27 主分类号 G06F17/21
代理机构 代理人
主权项
地址