发明名称 |
AN AUTOMATIC DEVICE FOR TRAINING AND CLASSIFYING DOCUMENTS BASED ON N-GRAM STATISTICS AND AN AUTOMATIC METHOD FOR TRAINING AND CLASSIFYING DOCUMENTS BASED ON N-GRAM STATISTICS THEREFOR |
摘要 |
The present invention relates to an apparatus for automatically learning documents and a method for automatically learning documents using the same, and an apparatus for automatically classifying documents and a method for automatically classifying documents using the same, which are capable of automatically learning and classifying mass documents on the web through a process of automatically learning and classifying documents based on n-gram. The apparatus for automatically classifying documents according to the present invention includes: a learning document pool including a plurality of learning document groups which are classified according to categories; a preprocessing unit configured to preprocess each of the learning document groups of the learning document pool; and an n-gram data set pool configured to store a set of n-gram data of the learning document pool, which is formed by being learned through the preprocessing of the preprocessing unit. Additionally, the apparatus for automatically classifying documents includes: an automatic document learning unit configured to allow the preprocessing unit to preprocess a corresponding new document to form a bigram set, when the new document occurs, which is not identified through the learning document pool; and an automatic document classifying unit configured to compare the bigram set of the new document, formed through the preprocessing unit, with a bigram set of the n-gram data set pool and to allocate and store the bigram set of the new document to one of n-gram data sets of the n-gram data set pool. [Reference numerals] (220) Automatic document classifying unit; (230) Learned n-gram data set(bigram example); (AA) Non-identified document; (BB) Appearance of a new document; (CC) Preprocessing |
申请公布号 |
KR20140049659(A) |
申请公布日期 |
2014.04.28 |
申请号 |
KR20120115730 |
申请日期 |
2012.10.18 |
申请人 |
INDUSTRY-ACADEMIC COOPERATION FOUNDATION, CHOSUN UNIVERSITY |
发明人 |
KIM, PAN KOO;CHOI, DONG JIN;KIM, JEONG IN;KO, MI AH |
分类号 |
G06F17/21;G06F17/27 |
主分类号 |
G06F17/21 |
代理机构 |
|
代理人 |
|
主权项 |
|
地址 |
|