发明名称 DOCUMENT IMAGE PROCESSING METHOD AND MACHINE-READABLE RECORDING MEDIUM WHERE PROGRAM ALLOWING COMPUTER TO IMPLEMENT DOCUMENT IMAGE PROCESSING METHOD IS RECORDED
摘要 PROBLEM TO BE SOLVED: To correctly extract character areas from a document image even if characters belonging to different character areas are close to each other. SOLUTION: The document image processing method which corrects excessive integration of extracted character areas when performing character recognition and layout information acquisition as to character areas by extracting the character areas from a document image includes a step S201 for inputting the document image, a step S202 for reducing the inputted document image, extracting the circumscribed rectangles of connecting components of black pixels constituting characters, and extracting basic elements, a step S203 for classifying the basic elements by characters, tables, figures, and others, generating lines by integrating the character elements, and extracting the character areas by integrating the lines, a step S204 for extracting column setting information from the character areas, and a step S205 for correcting over-integrated character areas by referring to the positions of extracted columns.
申请公布号 JP2000067158(A) 申请公布日期 2000.03.03
申请号 JP19980246519 申请日期 1998.08.18
申请人 RICOH CO LTD 发明人 SAITO TAKASHI
分类号 G06K9/20;(IPC1-7):G06K9/20 主分类号 G06K9/20
代理机构 代理人
主权项
地址