发明名称 METHODS AND APPARATUS TO EXTRACT TEXT FROM IMAGED DOCUMENTS
摘要 Methods and apparatus to extract text from imaged documents are disclosed. Example methods include segmenting an image of a document into localized sub-images corresponding to individual characters in the document. The example methods further include grouping respective ones of the sub-images into a cluster based on a visual correlation of the respective ones of the sub-images to a reference sub-image. The visual correlation between the reference sub-image and the respective ones of the sub-images grouped into the cluster exceeding a correlation threshold. The example methods also include identifying a designated character for the cluster based on the sub-images grouped into the cluster. The example methods further include associating the designated character with locations in the image of the document associated with the respective ones of the sub-images grouped into the cluster.
申请公布号 US2017124413(A1) 申请公布日期 2017.05.04
申请号 US201514927014 申请日期 2015.10.29
申请人 The Nielsen Company (US), LLC 发明人 Deng Kevin Keqiang
分类号 G06K9/34;G06K9/64;G06K9/03;G06K9/62 主分类号 G06K9/34
代理机构 代理人
主权项 1. A method comprising: segmenting, by executing an instruction with a processor, an image of a document into localized sub-images corresponding to individual characters in the document; grouping, by executing an instruction with the processor, respective ones of the sub-images into a cluster based on visual correlations of the respective ones of the sub-images to a reference sub-image, the visual correlations between the reference sub-image and the respective ones of the sub-images grouped into the cluster exceeding a correlation threshold; identifying, by executing an instruction with the processor, a designated character for one representative sub-image associated with the cluster; assigning, by executing an instruction with the processor, the designated character to the respective ones of the sub-images grouped into the cluster; and associating, by executing an instruction with the processor, the designated character with locations in the image of the document associated with the respective ones of the sub-images grouped into the cluster.
地址 New York NY US