发明名称 FRAMEWORK FOR DATA EXTRACTION BY EXAMPLES
摘要 Various technologies described herein pertain to controlling automated programming for extracting data from an input document. Examples indicative of the data to extract from the input document can be received. The examples can include highlighted regions on the input document. Moreover, the input document can be a semi-structured document (e.g. a text file, a log file, a word processor document, a semi-structured spreadsheet, a webpage, a fixed-layout document, an image file, etc.). Further, an extraction program for extracting the data from the input document can be synthesized based on the examples. The extraction program can be synthesized in a domain specific language (DSL) for a type of the input document. Moreover, the extraction program can be executed on the input document to extract an instance of an output data schema.
申请公布号 US2017091589(A1) 申请公布日期 2017.03.30
申请号 US201615376638 申请日期 2016.12.12
申请人 Microsoft Technology Licensing, LLC 发明人 Gulwani Sumit;Le Vu Minh
分类号 G06K9/62;G06F3/0484;G06K9/00;G06F17/24 主分类号 G06K9/62
代理机构 代理人
主权项 1. A computing system, comprising: at least one processor; and memory that comprises computer-executable instructions that, when executed by the at least one processor, cause the at least one processor to perform acts including: causing input regions of a document to be highlighted in a uniform user interface, the input regions of the document being examples of data to extract from the document, the uniform user interface being for a plurality of different document types; andcausing output regions of the document to be highlighted in the uniform user interface, the output regions being indicative of an instance of an output data schema, the instance of the output data schema being inferred from the document based on the examples of the data to extract from the document.
地址 Redmond WA US