发明名称 Query parser derivation computing device and method for making a query parser for parsing unstructured search queries
摘要 A system and method is provided which may comprise parsing an unstructured geographic web-search query into a field-based format, by utilizing conditional random fields, learned by semi-supervised automated learning, to parse structured information from the unstructured geographic web-search query. The system and method may also comprise establishing semi-supervised conditional random fields utilizing one of a rule-based finite state machine model and a statistics-based conditional random field model. Systematic geographic parsing may be used with the one of the rule-based finite state machine model and the statistics-based conditional random field model. Parsing an unstructured local geographical web-based query in local domain may be done by applying a learned model parser to the query, using at least one class-based query log from a form-based query system. The learned model parser may comprise at least one class-level n-gram language model-based feature harvested from a structured query log.
申请公布号 US9218390(B2) 申请公布日期 2015.12.22
申请号 US201113194887 申请日期 2011.07.29
申请人 YELLOWPAGES.COM LLC 发明人 Feng Donghui;Boydston Kirk;Murray Nathaniel A.;Retzer Clarke;Shanahan James G.;Zajac Remi
分类号 G06F7/00;G06F17/30 主分类号 G06F7/00
代理机构 Alston & Bird LLP 代理人 Alston & Bird LLP
主权项 1. A method comprising: deriving, via a query parser derivation computing device, a query parser for parsing an unstructured geographic web-search query into a field-based format, the deriving of the query parser comprising: receiving an input query, wherein the input query comprises a series of tokens;assigning a label to each of a plurality of the tokens;calculating the most probable label sequence for the input query;assigning one or more sentences from a plurality of sentences to each label based at least in part on the most probable label sequence for the input query, wherein: the one or more sentences are different from the labels; andthe one or more sentences are assigned so that the respective sentence identifies the respective label as corresponding to one or more of a search term, a geographic expression, a geographic expression relation indication, and/or uninteresting information;creating a conditional random field model based at least in part on i) the tokens, ii) the labels, iii) characterizing a set of one or more feature functions, wherein: the set of one or more feature functions represent a state transition feature and/or one or more features of an output state for an input sequence; anda conditional probability is computed based in part on the set of one or more feature functions;training the one or more state transition features and the one or more output state features on a labeled set, wherein learning the state transition feature is limited on learning the one or more features of the output state; and utilizing, by the query parser, conditional random fields, learned by semi-supervised automated learning and based at least in part on the training, to produce structured information from the unstructured geographic web-search query, wherein the utilizing the conditional random fields to produce the structured information comprises: parsing the unstructured geographic web-search query to produce the structured information from the unstructured geographic web-search query;determining that the parsing the unstructured geographic web-search query results in a multiple interpretation condition, where the parsing identifies at least a first interpretation of the unstructured geographic web-search query corresponding to first parsing results and a second interpretation of the unstructured geographic web-search query corresponding to second parsing results; and based at least in part on user behavior data, disambiguate the first parsing results and the second parsing results to select the first parsing results corresponding to the first interpretation of the unstructured geographic web-search query.
地址 Tucker GA US