发明名称 DOCUMENT PROCESSOR AND DOCUMENT PROCESSING METHOD
摘要 <P>PROBLEM TO BE SOLVED: To generate a structured document such as an XML (extensible markup language) document and an HTML (hypertext markup language) document, by setting in appropriate places document logic elements other than sentence such as graphs, and tables, contained in a printing document consisting of a plurality of pages. <P>SOLUTION: The device extracts a paragraph area and a graph area by analyzing document graphs in layout corresponding to a printing document with a layout analyzing part 11 while segmenting characters in the paragraph area to recognize and process with a character recognizing part 12. It extracts a document logic element area from the paragraph area by providing a character recognizing result and a layout analyzing result to a document logic element extracting part 13, and carries out order setting respectively to a document logic element area and a graph/table area with a reading order setting part 14. Then, it extracts a document structure by grouping respectively the document logic element area and the graph/table area with a document structure analyzing part 16, and generates the structure document by changing the appearance position of an area corresponding to the document logic elements other than sentence in the document structure and providing to a document output part 17. <P>COPYRIGHT: (C)2004,JPO
申请公布号 JP2003288334(A) 申请公布日期 2003.10.10
申请号 JP20020093092 申请日期 2002.03.28
申请人 TOSHIBA CORP 发明人 ISHITANI YASUTO
分类号 G06F17/21;G06K9/20 主分类号 G06F17/21
代理机构 代理人
主权项
地址