主权项 |
1. A method comprising:
segmenting, by executing an instruction with a processor, an image of a document into localized sub-images corresponding to individual characters in the document; grouping, by executing an instruction with the processor, respective ones of the sub-images into a cluster based on visual correlations of the respective ones of the sub-images to a reference sub-image, the visual correlations between the reference sub-image and the respective ones of the sub-images grouped into the cluster exceeding a correlation threshold; identifying, by executing an instruction with the processor, a designated character for one representative sub-image associated with the cluster; assigning, by executing an instruction with the processor, the designated character to the respective ones of the sub-images grouped into the cluster; and associating, by executing an instruction with the processor, the designated character with locations in the image of the document associated with the respective ones of the sub-images grouped into the cluster. |