摘要 |
<p>Documents represented as bitmap images (S100)are transformed into coded textual data (S120) and coded graphics data (S160) by graphics and textual recognizers, which use a standard notation for recording the results of the document recognition processes, including any ambiguities, in a document description language. Recognized portions of the document, represented as editable coded data, such as for example ASCII, are placed in elements, defined in the document description language, with all contents of an element sharing some common characteristic. Elements can include, for example: character-string-elements (S140), questionable-character-elements (S150), questionable-word-elements, verified-word-elements, alternative-word-elements, segment- elements, and arc-elements. Each element includes editable coded data, which also includes uncertainty information (S155) identifying any coded data which was not transformed with a predetermined level of confidence. <IMAGE></p> |