摘要 |
Automatic classification of different types of documents is disclosed. An image of a form or document is captured. The document is assigned to one or more type definitions by identifying one or more objects within the image of the document. A matching model is selected via identification of the document image. In the case of multiple identifications, a profound analysis of the document type is performed—either automatically or manually. An automatic classifier may be trained with document samples of each of a plurality of document classes or document types where the types are known in advance or a system of classes may be formed automatically without a priori information about types of samples. An automatic classifier determines possible features and calculates a range of feature values and possible other feature parameters for each type or class of document. A decision tree, based on rules specified by a user, may be used for classifying documents. Processing, such as optical character recognition (OCR), may be used in the classification process. |