发明名称 OPEN INFORMATION EXTRACTION FROM THE WEB
摘要 To implement open information extraction, a new extraction paradigm has been developed in which a system makes a single data-driven pass over a corpus of text, extracting a large set of relational tuples without requiring any human input. Using training data, a Self-Supervised Learner employs a parser and heuristics to determine criteria that will be used by an extraction classifier (or other ranking model) for evaluating the trustworthiness of candidate tuples that have been extracted from the corpus of text, by applying heuristics to the corpus of text. The classifier retains tuples with a sufficiently high probability of being trustworthy. A redundancy-based assessor assigns a probability to each retained tuple to indicate a likelihood that the retained tuple is an actual instance of a relationship between a plurality of objects comprising the retained tuple. The retained tuples comprise an extraction graph that can be queried for information.
申请公布号 US2011191276(A1) 申请公布日期 2011.08.04
申请号 US20100970155 申请日期 2010.12.16
申请人 UNIVERSITY OF WASHINGTON THROUGH ITS CENTER FOR COMMERCIALIZATION 发明人 CAFARELLA MICHAEL J.;BANKO MICHELE;ETZIONI OREN
分类号 G06F15/18;G06F17/30 主分类号 G06F15/18
代理机构 代理人
主权项
地址