发明名称 System for extracting attached text from a table-cell frame
摘要 <p>A method for identifying and extracting text data from a table-cell frame. The method includes the steps of tracing connected components of a document image, tracing white contours within a connected component, defining a frame outline based on the white contours, identifying unattached character data inside the frame outline, and defining an initial rectangular area inside the frame outline. The method further includes detecting black pixels in a horizontal or vertical direction from the initial rectangular area in order to create an extended character area, locating boundary pixels lying inside the extended character area for each white contour, identifying black pixels positioned between boundary pixels lying inside the extended character area, combining black pixels positioned between boundary pixels lying inside the extended character area so as to form at least one connected component, recognizing the at least one connected component as a text component if it is not recognized as a vertical line, as a horizontal line, as part of a broken line, or as part of the frame, and defining a character node of a hierarchical tree structure corresponding to the extended character area and containing both the at least one connected component and any identified unattached connected components. &lt;IMAGE&gt;</p>
申请公布号 EP0814422(A2) 申请公布日期 1997.12.29
申请号 EP19970304087 申请日期 1997.06.11
申请人 CANON KABUSHIKI KAISHA 发明人 SHIN-YWAN, WANG
分类号 G06K9/20;G06K9/34;G06T11/60;(IPC1-7):G06K9/20 主分类号 G06K9/20
代理机构 代理人
主权项
地址