摘要 |
The current document is directed to methods and systems for identifying symbols corresponding to symbol images in a scanned-document image or other text-containing image, with the symbols corresponding to Chinese or Japanese characters, to Korean morpho-syllabic blocks, or to symbols of other languages that use a large number of symbols for writing and printing. In one implementation, the methods and systems to which the current document is directed carry out an initial processing step on one or more scanned images to identify, for each symbol image within a scanned document, a set of graphemes that match, with high frequency, symbol patterns that, in turn, match the symbol image. The set of graphemes identified for a symbol image is associated with the symbol image as a set of candidate graphemes for the symbol image. The set of candidate graphemes are then used, in one or more subsequent steps, to associate each symbol image with a most likely corresponding symbol code. |
主权项 |
1. An optical-symbol-recognition system comprising:
a memory device; and one or more processors, coupled to the memory device, to:
receive a document image comprising a first symbol image of a plurality of symbol images;initialize a vote data structure that comprises a plurality of entries, an entry to store a cumulative vote of a plurality of cumulative votes and an associated grapheme of a plurality of graphemes, each grapheme corresponding to a symbol code of a plurality of symbol codes;determine, for the first symbol image, a cumulative vote for each of the plurality of graphemes, the cumulative vote based on a number of matches between the first symbol image and one or more graphemes of the plurality of graphemes;store determined cumulative votes for the first symbol image in the plurality of entries in the vote data structure;sort the plurality of entries in the vote data structure using the cumulative votes;select a grapheme from one of the plurality of entries that is identified based on an order associated with the cumulative votes; andgenerate a digital document comprising a symbol code corresponding to the selected grapheme. |