发明名称 Method and system for information extraction
摘要 A method and a system for extracting information from a natural language text corpus based on a natural language query are disclosed. In the method the natural language text corpus is analyzed with respect to surface structure of word tokens and surface syntactic roles of constituents, and the analyzed natural language text corpus is then indexed and stored. Furthermore a natural language query is analyzed with respect to surface structure of word tokens and surface syntactic roles of constituents. From the analyzed natural language query one or more surface variants are then created, where these surface variants are equivalent to the natural language query with respect to lexical meaning of word tokens and surface syntactic roles of constituents. The surface variants are then compared with the indexed and stored analyzed natural language text corpus, and each portion of text comprising a string of word tokens that matches the any one of the surface variants or the natural language query is extracted from the indexed and stored analyzed natural language text corpus.
申请公布号 US6842730(B1) 申请公布日期 2005.01.11
申请号 US20000599563 申请日期 2000.06.23
申请人 HAPAX LIMITED 发明人 EJERHED EVA INGEGORD;BRAROE PETER A.
分类号 G06F17/27;G06F17/30;(IPC1-7):G06F17/27;G06F12/20 主分类号 G06F17/27
代理机构 代理人
主权项
地址