发明名称 Natural language question expansion and extraction
摘要 Methods, computer program products and systems for generating at least one factual question from a set of seed questions and answer pairs. One method includes: obtaining at least one seed question and answer pair from the set of seed question and answer pairs; extracting a set of features associated with the at least one seed question and answer pair using at least one common analysis system (CAS) in a set of CASs and a specific knowledge base; generating a set of candidate questions from the extracted set of features using a logistic regression algorithm and the specific knowledge base, wherein each candidate question includes an expansion of each of the extracted set of features; and ranking each candidate question relative to a remainder of candidate questions in the set of candidate questions based on the extracted set of features and the at least one seed question and answer pair.
申请公布号 US9535898(B2) 申请公布日期 2017.01.03
申请号 US201313760639 申请日期 2013.02.06
申请人 International Business Machines Corporation 发明人 Baughman Aaron K.;Boyer Linda M.;Chuang Wesley T.;Ding Meilan;Gee William R.;Sakthi Palani
分类号 G06N5/02;G06F17/27 主分类号 G06N5/02
代理机构 Hoffman Warnick LLC 代理人 Simek Daniel;Hoffman Warnick LLC
主权项 1. A computer-implemented method for generating at least one candidate question from a set of seed question and answer pairs, the method comprising: obtaining at least one seed question and answer pair from the set of seed question and answer pairs; extracting a set of features associated with the at least one seed question and answer pair using at least one common analysis system (CAS) in a set of CASs and a specific knowledge base; generating a set of candidate questions from the extracted set of features using a logistic regression algorithm and the specific knowledge base, wherein each candidate question includes an expansion of each of the extracted set of features; ranking each candidate question relative to a remainder of candidate questions in the set of candidate questions based on the extracted set of features and the at least one seed question and answer pair; and re-ranking the set of candidate questions after the ranking, wherein the re-ranking includes: extracting a set of features from the candidate question using at least one CAS in the set of CASs and the specific knowledge base;generating a set of modified candidate questions from the extracted set of features from the candidate question using the logistic regression algorithm and the specific knowledge base, wherein each modified candidate question includes an expansion of each of the extracted set of features from the candidate question; andranking each modified candidate question relative to a remainder of modified candidate questions in the set of modified candidate questions based on the extracted set of features from the candidate question, wherein the re-ranking is performed after the extracting of the set of features associated with the at least one seed question and answer pair and the generating of the set of candidate questions from the extracted set of features.
地址 Armonk NY US