发明名称 METHODS AND SYSTEMS THAT BUILD A HIERARCHICALLY ORGANIZED DATA STRUCTURE CONTAINING STANDARD FEATURE SYMBOLS FOR CONVERSION OF DOCUMENT IMAGES TO ELECTRONIC DOCUMENTS
摘要 The current application is directed to methods and systems that convert document images, which contain Arabic text and text in other languages in which symbols are joined together to produce continuous words and portions of words, into corresponding electronic documents. In one implementation, a document-image-processing method and system to which the current application is directed employs numerous techniques and features that render efficiently computable an otherwise intractable or impractical document-image-to-electronic-document conversion. These techniques and features include transformation of text-image morphemes and words into feature symbols with associated parameters, efficiently identifying similar morphemes and words in an electronic store of standard-feature-symbol-encoded morphemes and words, and identifying candidate inter-character division points and corresponding traversal paths using the similar morphemes and words identified in the word store.
申请公布号 US2016147747(A1) 申请公布日期 2016.05.26
申请号 US201314781652 申请日期 2013.06.18
申请人 ABBYY DEVELOPMENT LLC 发明人 Chulinin Yury Georgievich
分类号 G06F17/28;G06F17/22;G06F17/27 主分类号 G06F17/28
代理机构 代理人
主权项 1. A system that transforms natural-language text sources into a searchable electronic database of natural-language morphemes and words, the system comprising: one or more processors; one or more electronic memories; and a hierarchically organized data structure, stored in one or more of the one or more electronic memories, each entry of which corresponds to a morpheme, word, or phrase encoded as sequences of standard feature symbols; and computer instructions, digitally encoded and stored in one or more of the one or more electronic memories and executed on the one or more processors, that extract morphemes and words from sources of Arabic-like morphemes and words;for each extracted morpheme and word, transform the extracted word or morpheme into a sequence of standard feature symbols and storing the standard feature symbols in one of the one or more electronic memories, andstore the sequence of standard feature symbols in the hierarchically organized data structure; andprovide an electronic interface to the hierarchically organized data structure that allows one or more sequences of standard feature symbols stored within the hierarchically organized data structure similar to an image of a morpheme or word to be automatically searched for and electronically returned to a requesting computational entity.
地址 Moscow RU