发明名称 FRAMEWORK FOR DATA EXTRACTION BY EXAMPLES
摘要 Various technologies described herein pertain to controlling automated programming for extracting data from an input document. Examples indicative of the data to extract from the input document can be received. The examples can include highlighted regions on the input document. Moreover, the input document can be a semi-structured document (e.g. a text file, a log file, a word processor document, a semi-structured spreadsheet, a webpage, a fixed-layout document, an image file, etc.). Further, an extraction program for extracting the data from the input document can be synthesized based on the examples. The extraction program can be synthesized in a domain specific language (DSL) for a type of the input document. Moreover, the extraction program can be executed on the input document to extract an instance of an output data schema.
申请公布号 US2015254530(A1) 申请公布日期 2015.09.10
申请号 US201514636664 申请日期 2015.03.03
申请人 Microsoft Technology Licensing, LLC 发明人 Gulwani Sumit;Le Vu Minh
分类号 G06K9/62;G06K9/00 主分类号 G06K9/62
代理机构 代理人
主权项 1. A computing system, comprising: at least one processor; and memory comprising a data extraction system, the data extraction system being executable by the at least one processor, the data extraction system comprising: an interaction component configured to receive examples indicative of data to extract from an input document, the examples comprise highlighted regions on the input document, the input document being a semi-structured document;a synthesis component configured to synthesize an extraction program for extracting the data from the input document based on the examples, the extraction program synthesized in a domain-specific language (DSL) for a type of the input document; andan interpretation component configured to execute the extraction program on the input document to extract an instance of an output data schema.
地址 Redmond WA US