摘要 |
PROBLEM TO BE SOLVED: To reduce calculation and the usage of storing resources to min. by using word length distribution information as a base for classification. SOLUTION: In a step 202, the image of document to be classified is scanned. Sentences are position-set in the document through the use of standard page dividing technique in the step 204. In the step 206, the lengths of respective words in a text, that is, the number of characters is decided. A probability distribution as against the word length is set in the step 208. In the step 210, classification is executed based on word length distribution information which is set in the step 208. Preferably, A feature vector set in the step 208 is compared with a prescribed representative feature vector which is set as against respective categories to be considered and the category expressed by the closest feature vectore is assigned to the document. |