摘要 |
<P>PROBLEM TO BE SOLVED: To generate a structured document such as an XML (extensible markup language) document and an HTML (hypertext markup language) document, by setting in appropriate places document logic elements other than sentence such as graphs, and tables, contained in a printing document consisting of a plurality of pages. <P>SOLUTION: The device extracts a paragraph area and a graph area by analyzing document graphs in layout corresponding to a printing document with a layout analyzing part 11 while segmenting characters in the paragraph area to recognize and process with a character recognizing part 12. It extracts a document logic element area from the paragraph area by providing a character recognizing result and a layout analyzing result to a document logic element extracting part 13, and carries out order setting respectively to a document logic element area and a graph/table area with a reading order setting part 14. Then, it extracts a document structure by grouping respectively the document logic element area and the graph/table area with a document structure analyzing part 16, and generates the structure document by changing the appearance position of an area corresponding to the document logic elements other than sentence in the document structure and providing to a document output part 17. <P>COPYRIGHT: (C)2004,JPO |