发明名称 System and method for identification and extraction of data
摘要 A system and method of for describing target data as a sequence of pattern elements and pattern element groups that comprise an overall target pattern is described. Pattern elements may utilize regular expression syntax along with other metadata that describe the behavior of the element. A pattern element group may be a collection of fully defined pattern elements where at least one pattern element from the group must have a match for the overall pattern to match. Patterns contain both pattern elements and pattern element groups. The general process involves first performing optical character recognition (OCR) on the document, which in turn produces a sequence of text tokens representing the lines of text on each page of the document. The search algorithm may then apply each defined pattern to the entire document capturing and/or extracting data that match each pattern's required elements and element groups.
申请公布号 US9589183(B2) 申请公布日期 2017.03.07
申请号 US201414552099 申请日期 2014.11.24
申请人 PARCHMENT, INC. 发明人 Brown Jason
分类号 G06K9/00;G06K9/18;G06F17/30 主分类号 G06K9/00
代理机构 Sheridan Ross P.C. 代理人 Sheridan Ross P.C.
主权项 1. A system for identifying and extracting text from an electronic document, the system comprising: one or more processors; memory; and a text identifier and extractor that receives the electronic document, generates a stream of text tokens representing a plurality of lines of text of the electronic document, matches a pattern to a portion of the stream of text tokens, and outputs the text in accordance with the matched pattern, wherein, the pattern includes an ordered sequence of a plurality of pattern elements representing the plurality of lines of text,each pattern element of the plurality of pattern elements describes at least one text token,the text identifier and extractor matches a text token in the stream of text tokens to a pattern element and continues to consume text tokens from the stream of text tokens until a subsequent text token in the stream of text tokens is matched to a subsequent pattern element having a required attribute,the pattern element and the subsequent pattern element belong to the same pattern, andthe electronic document is a transcript or certificate.
地址 Scottsdale AZ US