摘要 |
PROBLEM TO BE SOLVED: To describe the variety of layout features of document elements on a model easily to visually recognize by detecting the layout features by dividing a sample document image into prescribed units, designating the prescribed area of the sample document image and applying a prescribed attribute. SOLUTION: A document input means 102 reads a sample document 107 to become a model source and obtains a digital document image, next, the input sample document image is divided into areas and rows by a layout feature means 103 and further, characters are segmented for obtaining a character size or character interval. Then, concerning the model document image divided into elements, a logical model preparing means 108 applies a bibliographical item name to an element desired to extract by a user and next applies an attribute. A logical model is prepared by combining these attribute, bibliographical item name and layout features and that logical model is stored in a logical model managing database 109.
|