发明名称 Document classification method and apparatus
摘要 <p>A document is classified into at least one document class by selecting terms for use in the classification from among terms that occur in the document. A similarity between the input document and each class is calculated using information saved for every document class. The calculated similarity to each class is corrected. The class to which the input document belongs is determined in accordance with the corrected similarity to each class. Apparatus for effecting the classification comprises a document input unit (110), a data processing unit (120), a classification engine (130), a classification information unit (140), and a classification output unit (150) to classify a given input document, and in particular: a selector to select terms for use in the classification from among terms that occur in the input document entered into the document input unit; a calculator to calculate a similarity between the input document and each class using information saved for every document class beforehand; a corrector to correct the similarity; and a determinator to determine and output the class to which the input document belongs in accordance with the corrected similarity to each class.</p>
申请公布号 EP1365329(A3) 申请公布日期 2006.11.22
申请号 EP20030251175 申请日期 2003.02.26
申请人 HEWLETT-PACKARD COMPANY 发明人 KAWATANI, TAKAHIKO
分类号 G06K9/62;G06F17/30 主分类号 G06K9/62
代理机构 代理人
主权项
地址