主权项 |
1. A computer-implemented method for processing a digital image including recognizing characters with one or more components, comprising:
obtaining, with a computing device, a digital image of content including multiple characters, wherein a first character of the multiple characters includes components; generating, with the computing device, a vectorized version of the digital image of the content, wherein the vectorized version of the digital image has one or more vector representations corresponding to the components; identifying, with the computing device, a first component of the components in the digital image, the identifying including evaluating the vectorized representations of the one or more components with a neural network having been trained to recognize components by matching the first component to one or more predefined component shape patterns, wherein a first component shape pattern of the one or more component shape patterns visually represents a component shape; determining, with the computing device, one or more characters from the first component, the determining including evaluating the identified components using a plurality of combination rules configured to detect relationships between the first component and a second identified component based at least in part on spatial positions of the first component relative to the second component, and to determine the characters from the detected relationships, the determining further including determining the characters based in part on positions of the characters on a page of text in the digital image of the content; and sending, with the computing device, character codes for the determined one or more characters, the character codes specified by the plurality of combination rules. |