发明名称 Linguistically-adapted structural query annotation
摘要 A system and method for natural language processing of queries are provided. A lexicon includes text elements that are recognized as being a proper noun when capitalized. A natural language query includes a sequence of text elements including words. The query is processed. The processing includes a preprocessing step, in which part of speech features are assigned to the text elements in the query. This includes identifying, from a lexicon, a text element in the query which starts with a lowercase letter and assigning recapitalization information to the text element in the query, based on the lexicon. This information includes a part of speech feature of the capitalized form of the text element. Then parts of speech for the text elements in the query are disambiguated, which includes applying rules for recapitalizing text elements based on the recapitalization information.
申请公布号 US8812301(B2) 申请公布日期 2014.08.19
申请号 US201113245147 申请日期 2011.09.26
申请人 Xerox Corporation 发明人 Brun Caroline;Nikoulina Vassilina;Lagos Nikolaos
分类号 G06F17/20;G06F17/28;G06F17/27;G06F17/21;G10L21/00;G06F17/30 主分类号 G06F17/20
代理机构 Fay Sharpe LLP 代理人 Fay Sharpe LLP
主权项 1. A method for processing queries, comprising: providing access to a lexicon in which a set of text elements that each start with a lowercase letter are each recognized in the lexicon as being a proper noun when in a capitalized form, each of text elements in the set of text elements starting with a lowercase letter being linked in the lexicon to a respective capitalized form; receiving a natural language query to be processed, the query comprising a sequence of text elements, the text elements comprising words; with a computer processor, processing the query comprising: assigning part of speech (POS) features from a list of POS features to the text elements in the query, including: for a text element in the query which starts with a lowercase letter and which is among the set of text elements in the lexicon that are recognized as being a proper noun when in a capitalized form, assigning recapitalization information to the query text element, the recapitalization information comprising a part of speech feature of the capitalized form;disambiguating part of speech features for the text elements in the query including applying rules for recapitalizing text elements based on the recapitalization information; andchunking the disambiguated query; and outputting the processed query.
地址 Norwalk CT US