发明名称 Building and maintaining information extraction rules
摘要 Methods and arrangements for managing development of information extraction rules. One or more documents are opened for extraction. An interface is provided to create a label and thereupon label a portion of the document. The created label is stored, and an extractor is developed based on the labeling. A test interface is provided for the extractor, and results of a test conducted through the test interface are displayed. The extractor is exported. In accordance with at least one embodiment, developers are presented with eased automated guidance to write extractors, which thereby reduces an overall manual effort involved in extractor development. Generally, a focused, tutorial-type environment serves as a guide based on previously developed best practices.
申请公布号 US9436660(B2) 申请公布日期 2016.09.06
申请号 US201213679349 申请日期 2012.11.16
申请人 International Business Machines Corporation 发明人 Carreno-Fuentes Arnaldo;Chiticariu Laura;Kandogan Eser;Li Yunyao;Yang Huahai
分类号 G06F17/00;G06F17/21;G06F17/24;G06F17/30 主分类号 G06F17/00
代理机构 Ference & Associates LLC 代理人 Ference & Associates LLC
主权项 1. A method comprising: opening one or more documents for extraction; providing an interface to create a label and thereupon label a portion of the document; said providing of an interface comprising providing an extraction tasks view, a text editor and an extraction plan view; receiving at least one labeled example, wherein the at least one labeled example is labeled by a user and wherein the at least one labeled example identifies a portion of the document to extract; receiving at least one clue label created by the user, wherein the clue label indicates a reason for extraction of the at least one labeled example; storing the received at least one labeled example and at least one clue label in the extraction plan; developing an extractor using the at least one labeled example and the at least one clue label; said developing comprising conveying a predetermined structure for guiding a user and further comprising creating extraction rules based upon the at least one labeled example and the at least one clue label; wherein the predetermined extractor structure comprises the categories of: basic features, candidate generation, and consolidation; providing a test interface for the extractor; thereupon testing the extractor through the test interface; displaying results of a test of the extractor conducted through the test interface; permitting iteration of said steps of developing the extractor and testing the extractor; and exporting the extractor.
地址 Armonk NY US