发明名称 System and method for identifying regular geometric structures in document pages
摘要 A system and method for identifying regular geometric structures in a document page are disclosed. In the method, for a document page for which a set of page elements have been identified, the method includes identifying, where present, geometric relations among a subset of the page elements, from a predefined set of geometric relations, and a geometric structure comprising regular rows and regular columns, based on the identified geometric relations. Constraints of a definition of a regular geometric structure are applied to the identified geometric structure and, where the subset of page elements includes regular rows and regular columns forming a geometric structure which meets the constraints of the definition of a regular geometric structure, the subset of the page elements is identified as forming a regular geometric structure and may be labeled or tested to determine if it can be expanded by adding one or more rows or columns.
申请公布号 US9008443(B2) 申请公布日期 2015.04.14
申请号 US201213530141 申请日期 2012.06.22
申请人 Xerox Corporation 发明人 Déjean Hervé
分类号 G06K9/46;G06K9/00 主分类号 G06K9/46
代理机构 Fay Sharpe LLP 代理人 Fay Sharpe LLP
主权项 1. A method for identifying regular geometric structures in a document page, comprising: for a document page for which a set of page elements have been identified, at least some of the elements comprising more than one line of text, providing for: identifying geometric relations among a subset of the page elements that comprise more than one line of text, from a predefined set of geometric relations;identifying a geometric structure comprising regular rows and regular columns, based on the identified geometric relations; applying constraints of a definition of a regular geometric structure to the identified geometric structure, the applied constraints being selected from the group consisting of: each page element in a regular geometric structure belongs to no more than one regular row and no more than one regular column of the regular geometric structure;each page element of a regular row of the regular geometric structure does not vertically overlap any other page elements of the regular geometric structure other than those in that regular row; andeach page element of a regular column of the regular geometric structure does not horizontally overlap any other page elements of the regular geometric structure other than those in that regular column; andwhere the subset of page elements includes regular rows and regular columns forming a geometric structure which is determined to meet the constraints of the definition of a regular geometric structure, identifying the subset of the page elements as forming a regular geometric structure, wherein at least one of the identifying of geometric relations, the identifying of the geometric structure, and the applying constraints is performed by a computer processor.
地址 Norwalk CT US