发明名称 Methods and systems for efficient automated symbol recognition using multiple clusters of symbol patterns
摘要 The current document is directed to methods and systems for identifying symbols corresponding to symbol images in a scanned-document image or other text-containing image, with the symbols corresponding to Chinese or Japanese characters, to Korean morpho-syllabic blocks, or to symbols of other languages that use a large number of symbols for writing and printing. In one implementation, the methods and systems to which the current document is directed carry out an initial processing step on one or more scanned images to identify, for each symbol image within a scanned document, a set of graphemes that match, with high frequency, symbol patterns that, in turn, match the symbol image. The set of graphemes identified for a symbol image is associated with the symbol image as a set of candidate graphemes for the symbol image. The set of candidate graphemes are then used, in one or more subsequent steps, to associate each symbol image with a most likely corresponding symbol code.
申请公布号 US9633256(B2) 申请公布日期 2017.04.25
申请号 US201414565782 申请日期 2014.12.10
申请人 ABBYY Development LLC 发明人 Chulinin Yuri
分类号 G06K9/18;G06K9/00 主分类号 G06K9/18
代理机构 代理人 Weinstein Veronica
主权项 1. An optical-symbol-recognition system comprising: a memory device; and one or more processors, coupled to the memory device, to: receive a document image comprising a first symbol image of a plurality of symbol images;initialize a vote data structure that comprises a plurality of entries, an entry to store a cumulative vote of a plurality of cumulative votes and an associated grapheme of a plurality of graphemes, each grapheme corresponding to a symbol code of a plurality of symbol codes;determine, for the first symbol image, a cumulative vote for each of the plurality of graphemes, the cumulative vote based on a number of matches between the first symbol image and one or more graphemes of the plurality of graphemes;store determined cumulative votes for the first symbol image in the plurality of entries in the vote data structure;sort the plurality of entries in the vote data structure using the cumulative votes;select a grapheme from one of the plurality of entries that is identified based on an order associated with the cumulative votes; andgenerate a digital document comprising a symbol code corresponding to the selected grapheme.
地址 RU