发明名称 System for data extraction and processing
摘要 A system for extracting and interpreting information received in a human-readable format, typically PDF, assigning field tags to the extracted information and transferring the tagged information to a data processing system so that the tagged information can be uploaded to the system automatically. The system provides an incoming document with a time stamp to enable differentiation of the incoming document from other incoming documents, then, the incoming document may be spilt into sections to enable processing of each section individually. Subsequently, context and information are extracted by allowing a processing engine to apply a predetermined set of rules so that the extracted information to be ascribed meaning and assigned a field tag depending on its meaning. The system generates an editable output which is sent to a user.
申请公布号 US9558295(B2) 申请公布日期 2017.01.31
申请号 US201213981989 申请日期 2012.01.31
申请人 KeyWordLogic Limited 发明人 Develyn Richard
分类号 G06F17/30;G06K9/00;G06K9/72 主分类号 G06F17/30
代理机构 Stites & Harbison, PLLC 代理人 Stites & Harbison, PLLC ;Nagle, Jr. David W.
主权项 1. A method for extracting and processing data from an electronic document, the method comprising the steps of: (a) providing a data acquisition engine and a processing engine, wherein the data acquisition engine is capable of extracting data from the electronic document, which data comprises glyphs and at least one property spatially associated with the glyphs, and wherein said at least one property is selected from the group consisting of location, font, background, and shading graphical element; and (b) providing said processing engine with a plurality of rules and a backward tracking search algorithm, wherein the plurality of rules are provided with a hierarchy and the rules are applied in the order of the hierarchy, wherein at least one of the plurality of rules defines an anchor point, which anchor point comprises glyphs having a defined format, and wherein at least some of the plurality of rules describe the relationship between the extracted data and the at least one property; wherein said processing engine is capable of providing a data output having a defined format, and wherein in use, the processing engine analyses the extracted data by applying the plurality of rules upon the anchor point to determine the probability of whether the property and extracted data meet the requirements of the rules, such that the processing engine determines the best fit of the extracted data to the format of the data output and produces the data output, wherein the method allows a plurality of anchor points to be defined by the plurality of rules.
地址 Blyth, Northumberland GB