发明名称 METHOD FOR CLASSIFYING DOCUMENT BY WORD LENGTH DISTRIBUTION STATE ANALYSIS, RECORDING MEDIUM RECORDING THE SAME AND COMPUTER SYSTEM FOR EXECUTING THE SAME
摘要 PROBLEM TO BE SOLVED: To reduce calculation and the usage of storing resources to min. by using word length distribution information as a base for classification. SOLUTION: In a step 202, the image of document to be classified is scanned. Sentences are position-set in the document through the use of standard page dividing technique in the step 204. In the step 206, the lengths of respective words in a text, that is, the number of characters is decided. A probability distribution as against the word length is set in the step 208. In the step 210, classification is executed based on word length distribution information which is set in the step 208. Preferably, A feature vector set in the step 208 is compared with a prescribed representative feature vector which is set as against respective categories to be considered and the category expressed by the closest feature vectore is assigned to the document.
申请公布号 JPH10111867(A) 申请公布日期 1998.04.28
申请号 JP19970158910 申请日期 1997.06.16
申请人 RICOH CO LTD 发明人 JONATHAN J HAL
分类号 G06F17/21;G06F17/27;G06F17/30 主分类号 G06F17/21
代理机构 代理人
主权项
地址