发明名称 Method and apparatus for extracting a title from a scanned document
摘要 A title extracting apparatus scans black pixels in a document image and extracts rectangular regions that circumscribe connected regions of the black pixels as character rectangles. In addition, the title extracting apparatus unifies a plurality of character rectangles that adjoin and extracts rectangular regions that circumscribe the character rectangles as character string rectangles. Thereafter, the title extracting apparatus calculates points with the likelihood of being a title corresponding to attributes such as an underline attribute, a frame attribute, and a ruled line attribute of each character string rectangle, the positions of the character string rectangles in the document image, and the mutual position relation and extracts a character string rectangle with the highest points as a title rectangle. In the case of a tabulated document, the title extracting apparatus can extract a title rectangle from the inside of the table. Characters extracted from the title rectangle are used as keywords of a document image by the character recognizing process.
申请公布号 EP0762730(A3) 申请公布日期 1998.01.28
申请号 EP19960112721 申请日期 1996.08.07
申请人 FUJITSU LIMITED 发明人 KATSUYAMA, YUTAKA;NAOI, SATOSHI
分类号 G06K9/20;G06T11/60 主分类号 G06K9/20
代理机构 代理人
主权项
地址