发明名称 Iteratively learning coreference embeddings of noun phrases using feature representations that include distributed word representations of the noun phrases
摘要 Methods and apparatus related to determining coreference resolution using distributed word representations. Distributed word representations, indicative of syntactic and semantic features, may be identified for one or more noun phrases. For each of the one or more noun phrases, a referring feature representation and an antecedent feature representation may be determined, where the referring feature representation includes the distributed word representation, and the antecedent feature representation includes the distributed word representation augmented by one or more antecedent features. In some implementations the referring feature representation may be augmented by one or more referring features. Coreference embeddings of the referring and antecedent feature representations of the one or more noun phrases may be learned. Distance measures between two noun phrases may be determined based on the coreference embeddings.
申请公布号 US9514098(B1) 申请公布日期 2016.12.06
申请号 US201314141182 申请日期 2013.12.26
申请人 Google Inc. 发明人 Subramanya Amarnag;Liu Jingyi;Pereira Fernando Carlos das Neves;Chen Kai;Ponte Jay;Al-Rfou′ Rami
分类号 G06F17/21 主分类号 G06F17/21
代理机构 Middleton Reutlinger 代理人 Middleton Reutlinger
主权项 1. A computer implemented method useful for modifying a search query issued by a client device, comprising: identifying, by one or more computer systems, distributed word representations for a plurality of noun phrases, the distributed word representations indicative of syntactic and semantic features of the noun phrases; determining, by one or more of the computer systems for each of one or more of the noun phrases and based on labeled data, at least one training pair of a referring feature representation and an antecedent feature representation, wherein: the referring feature representation for the at least one training pair for a given noun phrase of the one or more noun phrases includes the distributed word representation for the given noun phrase, andthe antecedent feature representation for the at least one training pair for the given noun phrase includes the distributed word representation for the given noun phrase augmented by one or more antecedent features, wherein the one or more antecedent features include a parse tree distance for the given noun phrase as a candidate antecedent noun phrase in the labeled data, the parse tree distance being a parse tree based distance between the given noun phrase as the candidate antecedent noun phrase and a corresponding referring noun phrase; wherein the referring feature representations are m-dimensional space vectors, the antecedent feature representations are n-dimensional space vectors, and wherein the m-dimensional space vectors vary in length from the n-dimensional space vectors; learning, by one or more of the computer systems, coreference embeddings of the referring and antecedent feature representations of the noun phrases, the learning comprising iteratively embedding the m-dimensional space vectors and the n-dimensional space vectors into a common k-dimensional space; identifying, by one or more of the computer systems after the learning of the coreference embeddings, a first text segment and a second text segment associated with the first text segment, wherein the second text segment is a search query issued by a client device of a user; identifying, by one or more of the computer systems in the first text segment, an occurrence of one or more candidate antecedent noun phrases; identifying, by one or more of the computer systems in the second text segment, an occurrence of the given noun phrase; determining, by one or more of the computer systems for the given noun phrase, distance measures, in the common k-dimensional space, between the given noun phrase and the one or more candidate antecedent noun phrases based on inner products of the coreference embeddings in the common k-dimensional space; determining, by one or more of the computer systems, for a candidate noun phrase of the candidate antecedent noun phrases, a score for the candidate noun phrase as an antecedent for the given noun phrase based on the distance measure between the given noun phrase and the candidate noun phrase; selecting, by one or more of the computer systems, the candidate noun phrase as the antecedent for the given noun phrase based on the determined score; modifying, by one or more of the computer systems, the search query issued by the client device, wherein modifying the search query comprises replacing the given noun phrase with the selected candidate noun phrase in response to selecting the candidate noun phrase as the antecedent for the given noun phrase; and providing, by one or more of the computer systems in response to the search query issued by the client device, search results that are responsive to the modified query that replaces the given noun phrase with the selected candidate noun phrase.
地址 Mountain View CA US