发明名称 DOCUMENT RECOGNITION DEVICE
摘要 PROBLEM TO BE SOLVED: To grasp structural features of a document from a recognition result by optically reading characters and recognizing them, and analyzing the logical structure of the document from obtained image information and character information according to a specific rule. SOLUTION: An OCR part 11 of an optical image reader 10 reads out a document to be recognized. An image information recognition part 12 analyzes the image of the read document and recognizes image information on ruled lines, underlines, etc., other than character information. A character position information recognition part 133 recognizes appearance positions of characters from the image information and segments character patterns. Further, a font information recognition part 132 recognizes the fonts of the characters as to whether the characters are printed or handwritten accompanying conversion into character codes by the character recognition of a character code conversion part 131. Then, a DTD generation part 21 of a document analyzing device 20 analyzes the logical structure of the document according to the specific rule and generates document definitions(DTD) determining the structure of the document or format.
申请公布号 JPH1049522(A) 申请公布日期 1998.02.20
申请号 JP19960204476 申请日期 1996.08.02
申请人 FUJITSU LTD 发明人 MATSUOKA HIDETATSU;MATSUI KUNIO
分类号 G06K9/20;G06F17/21;G06F17/27 主分类号 G06K9/20
代理机构 代理人
主权项
地址