摘要 |
<p>Provided is an automatic document classification system capable of classifying with high accuracy and with a small amount of calculation, when an amount of leaning data is small. The automatic document classification system to classify a new document into any of the classes to which existing documents have been classified, using a keyword series of the new input document, wherein the system is provided with a first storage means to store previous distribution estimation frequency data which indicates a number of documents classified into a class (X') in a set of documents for previous distribution estimation, and a number of keywords (keyi) contained in the documents classified into the class (X') in the set of documents for the previous distribution estimation; a second storage means to store leaning frequency data which indicates a number of documents classified into the class (X') in a set of documents for leaning, and a number of keywords (keyi) contained in the documents classified into the class (X') in the set of documents for leaning; a frequency data acquisition means; and a classification class determination means, and wherein, the frequency data acquisition means is configured to acquire frequency data by reading the previous distribution estimation frequency data and the leaning frequency data of the respective classes from the first storage means and the second storage means, respectively, when the keyword series of the new document is input; and the classification class determination means is configured to determine a classification class which reliably minimizes an error rate in accordance with the Bayesian criterion, the error rate representing a rate that the new document is classified into a classification class to which the new document should not be classified, and to output the classification class determined for the new document, in response to the input of the keyword series of the new document and the previous distribution estimation frequency data and the leaning frequency data for the respective classes.</p> |