发明名称 DOCUMENT INFORMATION PROCESSOR AND PROGRAM
摘要 <P>PROBLEM TO BE SOLVED: To properly extract even a text art portion to produce a structured document, in a document information processor producing a structured document. <P>SOLUTION: A picture and character separation part 200 separates an image portion G10 and a text portion G20 from an HTML document acquired by an electronic document input part 100. A text art acquisition part 300 specifies a text art having meaning in a pattern expressed by a character array from extracted character information. A structured document production part 500 refers to information G24a, G24b or G24c about the text art specified by the text art acquisition part 300, and structured information G20c produced as usual by a text analysis part 400 or pieces of information G10a, G10b about the image portion G10 to finish the logical structured document. <P>COPYRIGHT: (C)2005,JPO&NCIPI
申请公布号 JP2004287992(A) 申请公布日期 2004.10.14
申请号 JP20030080838 申请日期 2003.03.24
申请人 FUJI XEROX CO LTD 发明人 ITO YASUHIRO
分类号 G06F17/21;G06K9/20 主分类号 G06F17/21
代理机构 代理人
主权项
地址