摘要 |
A classification system having a controller a document storage memory, and a document input is used to classify documents. The controller is programmed to generate a theme score from a plurality of source documents in a plurality of pre-classified source documents. A theme score is also generated for the unclassified document. The unclassified document theme score and the theme scores for the various classes are compared and the unclassified document is classified into the classification having the nearest theme score. Manually identified mis-classified documents may be used to improve the classification system. Different sections of a patent document (eg. abstract, description, claims) may be given different weights for classification.
|