发明名称 AUTOMATIC DOCUMENT CLASSIFICATION SYSTEM
摘要 <p>Provided is an automatic document classification system capable of classifying with high accuracy and with a small amount of calculation, when an amount of leaning data is small. The automatic document classification system to classify a new document into any of the classes to which existing documents have been classified, using a keyword series of the new input document, wherein the system is provided with a first storage means to store previous distribution estimation frequency data which indicates a number of documents classified into a class (X') in a set of documents for previous distribution estimation, and a number of keywords (keyi) contained in the documents classified into the class (X') in the set of documents for the previous distribution estimation; a second storage means to store leaning frequency data which indicates a number of documents classified into the class (X') in a set of documents for leaning, and a number of keywords (keyi) contained in the documents classified into the class (X') in the set of documents for leaning; a frequency data acquisition means; and a classification class determination means, and wherein, the frequency data acquisition means is configured to acquire frequency data by reading the previous distribution estimation frequency data and the leaning frequency data of the respective classes from the first storage means and the second storage means, respectively, when the keyword series of the new document is input; and the classification class determination means is configured to determine a classification class which reliably minimizes an error rate in accordance with the Bayesian criterion, the error rate representing a rate that the new document is classified into a classification class to which the new document should not be classified, and to output the classification class determined for the new document, in response to the input of the keyword series of the new document and the previous distribution estimation frequency data and the leaning frequency data for the respective classes.</p>
申请公布号 WO2010101005(A1) 申请公布日期 2010.09.10
申请号 WO2010JP51917 申请日期 2010.02.10
申请人 NATIONAL UNIVERSITY CORPORATION KITAMI INSTITUTE OF TECHNOLOGY;MAEDA, YASUNARI 发明人 MAEDA, YASUNARI
分类号 G06F17/30 主分类号 G06F17/30
代理机构 代理人
主权项
地址