摘要 |
<p>A document is classified into at least one document class by selecting terms for use in the classification from among terms that occur in the document. A similarity between the input document and each class is calculated using information saved for every document class. The calculated similarity to each class is corrected. The class to which the input document belongs is determined in accordance with the corrected similarity to each class. Apparatus for effecting the classification comprises a document input unit (110), a data processing unit (120), a classification engine (130), a classification information unit (140), and a classification output unit (150) to classify a given input document, and in particular: a selector to select terms for use in the classification from among terms that occur in the input document entered into the document input unit; a calculator to calculate a similarity between the input document and each class using information saved for every document class beforehand; a corrector to correct the similarity; and a determinator to determine and output the class to which the input document belongs in accordance with the corrected similarity to each class.</p> |