发明名称 Techniques for generating translation clusters
摘要 A computer-implemented technique can include receiving, at a server including one or more processors, a source word in a source language. The technique can include determining, at the server, one or more potential translations for the source word in a target language different than the source language. The technique can include determining, at the server, one or more synonyms for each of the one or more potential translations to obtain a plurality of potential translations. The technique can include determining, at the server, one or more translation clusters using the plurality of potential translations and a clustering algorithm. Each translation cluster can contain all of the plurality of potential translations that have a similar denotation and each of the plurality of translations that have a similar denotation can be included in a specific translation cluster. The technique can also include outputting, at the server, the one or more translation clusters.
申请公布号 US9311293(B2) 申请公布日期 2016.04.12
申请号 US201213600301 申请日期 2012.08.31
申请人 Google Inc. 发明人 DeNero John;Bansal Mohit
分类号 G06F17/28;G06F17/27 主分类号 G06F17/28
代理机构 Remarck Law Group PLC 代理人 Remarck Law Group PLC
主权项 1. A computer-implemented method, comprising: receiving, at a server from a computing device via a network, the server including one or more processors, a single source word in a source language, wherein the single source word is input by a user at the computing device; determining, at the server, one or more potential translations for the single source word in a target language different than the source language; determining, at the server, one or more synonyms for each of the one or more potential translations to obtain a plurality of potential translations, wherein the synonyms are stored in a datastore, and wherein the datastore can be accessed via a network; generating, at the server, one or more translation clusters using the plurality of potential translations and a first clustering algorithm and without using a context of the single source word, each translation cluster containing all of the plurality of potential translations that have a similar denotation each of the plurality of potential translations that have a similar denotation are included in a specific translation cluster, each translation cluster including at least one distinct potential translation of the plurality of potential translations, the one or more translation clusters collectively including all of the plurality of potential translations; and outputting, from the server to the computing device via the network, information based on the one or more translation clusters, wherein the first clustering algorithm is defined as:  ← {C ∩ Ts : C ∈ ∪t∈Ts   t} ← ∅for   ∈   doif     ′ ∈   such that   ⊂   ′ thenadd   to return  where TS represents the plurality of potential translations, C represents a synonym set including a set of target-language words, t represents a set of synonym sets in which a specific potential translation t appears, B represents a source-specific synonym set, which is a subset of TS, represents a set of source-specific synonym sets, and represents the one or more translation clusters for TS.
地址 Mountain View CA US