发明名称 Method and system for information extraction
摘要 A method, a system and a computer program for extracting information from a natural language text corpus based on a natural language query are disclosed. The natural language text corpus is indexed and stored. A natural language query is analyzed with respect to phrases, phrase types, syntactic roles, word tokens of phrases, and lexical meaning of word tokens. One or more surface variants are created for at least one phrase of the natural language query. The one or more surface variants each have the same phrase type as the at least one phrase of the natural language query, and each comprise a word token which is a lexical head and has the same lexical meaning as a word token which is a lexical head of the at least one phrase of the natural language query. The one or more surface variants and the at least one phrase of the natural language query are compared with the indexed and stored natural language text corpus. Portions of text are extracted from the indexed and stored natural language text corpus, which portions comprise a string of word tokens that matches any one of said surface variants or said at least one phrase of the natural language query.
申请公布号 US7657425(B2) 申请公布日期 2010.02.02
申请号 US20070723079 申请日期 2007.03.16
申请人 HAPAX LIMITED 发明人 EJERHED EVA INGEGERD;BRAROE PETER A.
分类号 G06F17/27;G06F17/20;G06F17/30 主分类号 G06F17/27
代理机构 代理人
主权项
地址