发明名称 Mining forums for solutions to questions and scoring candidate answers
摘要 An approach is provided for mining threaded online discussions. In the approach, performed by an information handling system, a natural language processing (NLP) analysis is performed on threaded discussions pertaining to a given topic. The analysis is performed across multiple web sites with each of the web sites including one or more threaded discussions. The analysis results in harvested discussions pertaining to the topic. The harvested discussions are correlated and a question is identified from the harvested discussions. A set of candidate answers is also identified from the harvested discussions, with one of the candidate answers being selected as the most likely answer to the identified question.
申请公布号 US9471874(B2) 申请公布日期 2016.10.18
申请号 US201314099926 申请日期 2013.12.07
申请人 International Business Machines Corporation 发明人 Byron Donna K.;LaVoie Jason D.
分类号 G06N5/04;G06N99/00;G06F17/27 主分类号 G06N5/04
代理机构 VanLeeuwen & VanLeeuwen 代理人 VanLeeuwen & VanLeeuwen ;Sarbakhsh Reza
主权项 1. A method, in an information handling system comprising a processor and a memory, of mining threaded online discussions, the method comprising: performing, by the information handling system, a natural language processing (NLP) analysis of one or more threaded discussions pertaining to a given topic, wherein the analysis is performed across one or more web sites with each of the web sites including one or more of the threaded discussions, wherein the analysis results in a plurality of harvested discussions; correlating the plurality of harvested discussions across a plurality of threads from the one or more web sites; identifying a question from the harvested discussions; identifying a plurality of candidate answers from the harvested discussions, wherein each of the plurality of candidate answers pertain to the identified question; aggregating and merging a selected plurality of harvested discussions corresponding to each of the candidate answers, wherein the selected plurality of harvested discussions are supporting evidence corresponding to the respective candidate answer; generating a supporting evidence score based on one or more factors of the supporting evidence for each of the candidate answers, wherein at least one of the factors is selected from the group consisting of a quality of the supporting evidence, and a quantity of the supporting evidence; generating an answer post score for each of the candidate answers based on an identification of a rating within the threaded discussions pertaining to the respective candidate answer; generating a post provider score for each of the candidate answers based on an identified expertise level that corresponds to a provider of the respective candidate answer; generating a follow-up score for each of the candidate answers based on one or more follow-up comments from posters that indicate that the respective candidate answer was correct; and scoring each of the plurality of candidate answers, wherein the scoring calculates an overall score corresponding to each of the candidate answers, wherein the overall score is based upon one or more component scores selected from the group consisting of the supporting evidence score, the answer post score, the post provider score, and the follow-up score, and wherein a selected answer has the highest overall score when compared to the other candidate answers.
地址 Armonk NY US