发明名称 Automated document processing system
摘要 An automated document processing system is configured to normalize zones obtained from a document, and to extract articles from the normalized zones. In one configuration, the system receives at least one zone from the document, and applies at least one zone-breaking factor, thereby creating normalized sub-zones within which text lines are consistent with the at least one zone-breaking factor. The normalized sub-zones may be evaluated to obtain a reading order. Adjacent sub-zones are joined if text similarity exceeds a threshold value. Weakly joined sub-zones are separated where indicated by a topic vectors analysis of the weakly joined sub-zones.
申请公布号 US8948511(B2) 申请公布日期 2015.02.03
申请号 US200511253305 申请日期 2005.10.19
申请人 Hewlett-Packard Development Company, L.P. 发明人 Ortega Daniel;Yacoub Sherif;Peiro Jose Abad;Faraboschi Paolo
分类号 G06K9/34;G06K9/00 主分类号 G06K9/34
代理机构 Lee & Hayes, PLLC 代理人 Lee & Hayes, PLLC ;Thompson David S.
主权项 1. One or more computer-readable non-transitory storage media comprising computer-executable instructions for configuring a computer to extract an article from a document, the computer-executable instructions comprising instructions for: evaluating zones within a page of a document to obtain a reading order; joining adjacent zones within the reading order, where appropriate, in view of text similarity; and breaking weakly joined zones using topic vectors analysis.
地址 Houston TX US
您可能感兴趣的专利