发明名称 |
Automated document processing system |
摘要 |
An automated document processing system is configured to normalize zones obtained from a document, and to extract articles from the normalized zones. In one configuration, the system receives at least one zone from the document, and applies at least one zone-breaking factor, thereby creating normalized sub-zones within which text lines are consistent with the at least one zone-breaking factor. The normalized sub-zones may be evaluated to obtain a reading order. Adjacent sub-zones are joined if text similarity exceeds a threshold value. Weakly joined sub-zones are separated where indicated by a topic vectors analysis of the weakly joined sub-zones. |
申请公布号 |
US8948511(B2) |
申请公布日期 |
2015.02.03 |
申请号 |
US200511253305 |
申请日期 |
2005.10.19 |
申请人 |
Hewlett-Packard Development Company, L.P. |
发明人 |
Ortega Daniel;Yacoub Sherif;Peiro Jose Abad;Faraboschi Paolo |
分类号 |
G06K9/34;G06K9/00 |
主分类号 |
G06K9/34 |
代理机构 |
Lee & Hayes, PLLC |
代理人 |
Lee & Hayes, PLLC ;Thompson David S. |
主权项 |
1. One or more computer-readable non-transitory storage media comprising computer-executable instructions for configuring a computer to extract an article from a document, the computer-executable instructions comprising instructions for:
evaluating zones within a page of a document to obtain a reading order; joining adjacent zones within the reading order, where appropriate, in view of text similarity; and breaking weakly joined zones using topic vectors analysis. |
地址 |
Houston TX US |