发明名称 METHODS AND SYSTEMS FOR EFFICIENT AUTOMATED SYMBOL RECOGNITION USING DECISION FORESTS
摘要 The current document is directed to methods and systems for identifying symbols corresponding to symbol images in a scanned-document image or other text-containing image, with the symbols corresponding to Chinese or Japanese characters, to Korean morpho-syllabic blocks, or to symbols of other languages that use a large number of symbols for writing and printing. In one implementation, the methods and systems to which the current document is directed carry out an initial processing step on one or more scanned images to identify a set of graphemes that most likely correspond to each symbol image that occurs in the scanned document image. The graphemes are selected for a symbol image based on accumulated votes generated from symbol patterns identified as likely related to the symbol image using one or more decision forests.
申请公布号 US2016247019(A1) 申请公布日期 2016.08.25
申请号 US201514880583 申请日期 2015.10.12
申请人 ABBYY Development LLC 发明人 Chulinin Yury Georgievich;Senkevich Oleg
分类号 G06K9/00 主分类号 G06K9/00
代理机构 代理人
主权项 1. An optical-symbol-recognition system comprising: one or more processors; one or more memories; one of more data-storage devices; and computer instructions, stored in one or more of the one or more data-storage devices that, when executed by one or more of the one or more processors, control the optical-symbol-recognition system to process a text-containing scanned image of a document by: identifying symbol images in the text-containing scanned image of the document;for each page in the document, for each symbol image in the page, identifying, using a decision forest, a set of candidate pattern data structures for the symbol image,using the candidate pattern data structures to identify a set of candidate graphemes, andusing the identified set of candidate graphemes to select a symbol code that represents the symbol image; andpreparing a processed document containing symbol codes that represent the symbol images in the scanned image of the document and storing the processed document in one or more of the one of more data-storage devices and memories.
地址 Moscow RU