摘要 |
In a method of answering questions and scoring answers, a title and at least one topical field are identified for a document. A field name and field content associated with the topical field is identified, and a title-oriented document is created by combining the title, the field name, and the field content associated with the topical field. For each title-oriented document, a term in the title is matched to previously established categories to produce a title concept identifier. The topical field is synthesized to produce a field concept identifier and a field content concept identifier. A question is received. The question topic term and the question content identifier are used to identify at least one question-matching relation instance. The title concept identifier of each question-matching relation instance is identified as a candidate answer to the question. Each candidate answer and a corresponding answer score is output. |
主权项 |
1. A computer system for scoring answers to questions in a question-answering system, comprising:
a processor comprising an automated question answering (QA) system comprising:
a tangible storage device operatively connected to said processor, said tangible storage device storing a corpus of data comprising natural language documents; anda user interface operatively connected to said processor, said user interface receiving a question into said automated QA system, said processor constructing title-oriented documents from said corpus of data, each title-oriented document comprising a title and a topical field, said topical field comprising a field name and field content associated with said topical field of a document in said corpus of data, said processor creating a relation instance by combining a field identifier for said topical field, a title concept identifier, and a corresponding field content concept identifier, said processor calculating a count for each said relation instance based on a number of occurrences of said title concept identifier and said field content concept identifier within a corresponding document in said corpus of data, said processor analyzing terms in said question, said analyzing identifying a question content identifier based on previously established question term categories, said processor comparing said question content identifier to said relation instance, said comparing identifying a question-matching relation instance, said processor generating an answer to said question by identifying said title concept identifier of each said question-matching relation instance as a candidate answer to said question, and said processor generating a score for said candidate answer by adding each said count within each said relation instance corresponding to said candidate answer. |