发明名称 |
System and method for identifying regular geometric structures in document pages |
摘要 |
A system and method for identifying regular geometric structures in a document page are disclosed. In the method, for a document page for which a set of page elements have been identified, the method includes identifying, where present, geometric relations among a subset of the page elements, from a predefined set of geometric relations, and a geometric structure comprising regular rows and regular columns, based on the identified geometric relations. Constraints of a definition of a regular geometric structure are applied to the identified geometric structure and, where the subset of page elements includes regular rows and regular columns forming a geometric structure which meets the constraints of the definition of a regular geometric structure, the subset of the page elements is identified as forming a regular geometric structure and may be labeled or tested to determine if it can be expanded by adding one or more rows or columns. |
申请公布号 |
US9008443(B2) |
申请公布日期 |
2015.04.14 |
申请号 |
US201213530141 |
申请日期 |
2012.06.22 |
申请人 |
Xerox Corporation |
发明人 |
Déjean Hervé |
分类号 |
G06K9/46;G06K9/00 |
主分类号 |
G06K9/46 |
代理机构 |
Fay Sharpe LLP |
代理人 |
Fay Sharpe LLP |
主权项 |
1. A method for identifying regular geometric structures in a document page, comprising:
for a document page for which a set of page elements have been identified, at least some of the elements comprising more than one line of text, providing for:
identifying geometric relations among a subset of the page elements that comprise more than one line of text, from a predefined set of geometric relations;identifying a geometric structure comprising regular rows and regular columns, based on the identified geometric relations; applying constraints of a definition of a regular geometric structure to the identified geometric structure, the applied constraints being selected from the group consisting of:
each page element in a regular geometric structure belongs to no more than one regular row and no more than one regular column of the regular geometric structure;each page element of a regular row of the regular geometric structure does not vertically overlap any other page elements of the regular geometric structure other than those in that regular row; andeach page element of a regular column of the regular geometric structure does not horizontally overlap any other page elements of the regular geometric structure other than those in that regular column; andwhere the subset of page elements includes regular rows and regular columns forming a geometric structure which is determined to meet the constraints of the definition of a regular geometric structure, identifying the subset of the page elements as forming a regular geometric structure, wherein at least one of the identifying of geometric relations, the identifying of the geometric structure, and the applying constraints is performed by a computer processor. |
地址 |
Norwalk CT US |