摘要 |
PROBLEM TO BE SOLVED: To extract and structure a content filled in a printing document and to automatically input it in a computer. SOLUTION: A document processor is provided with a means 1 extracting a layout object and structure from a document image, a means 3 extracting the logic objects of a paragraph, a list, a numerical formula, a program, an annotation and the like based on a typography from the area of a text extracted from the document image, a means 5 extracting a plurality of reading orders among the objects, a means 4 applying a model which is previously defined for the logic object and extracting logic structure. Primary information and secondary information are extracted from a document constituted of a plurality of various pages, which is constituted of a character, a photograph, a graphic and a list. Information are converted into various electronic formats. Thus, a document management system can automatically be constructed and various computer applications can effectively be used. |