发明名称 Annotating maps with user-contributed pronunciations
摘要 Systems and methods are provided to select a most typical pronunciation of a location name on a map from a plurality of user pronunciations. A server generates a reference speech model based on user pronunciations, compares the user pronunciations with the speech model and selects a pronunciation based on comparison. Alternatively, the server compares the distance between one the user pronunciations and every other user pronunciations and selects a pronunciation based on comparison. The server then annotates the map with the selected pronunciation and provides the audio output of the location name to a user device upon a user's request.
申请公布号 US8949125(B1) 申请公布日期 2015.02.03
申请号 US201012816563 申请日期 2010.06.16
申请人 Google Inc. 发明人 Chechik Gal
分类号 G10L15/00;G10L21/00;G08G1/123;G06F7/00;G06F17/00 主分类号 G10L15/00
代理机构 Lerner, David, Littenberg, Krumholz & Mentlik, LLP 代理人 Lerner, David, Littenberg, Krumholz & Mentlik, LLP
主权项 1. A method of selecting a user spoken utterance, the method comprising: receiving, at a processing device, a set of user spoken utterances of a text string, each spoken utterance being a pronunciation of the text string by a corresponding different user and comprising a location name or a point of interest; generating, at the processing device, a speech model based on the text string and the set of received user spoken utterances from each corresponding user; comparing the generated speech model to each pronunciation received in the set of user spoken utterances; selecting a given one of the received user spoken utterances, as a most typical pronunciation of the text string, based on measured distance values between the speech model for the selected user spoken utterance and every other generated speech model, wherein selecting the given user spoken utterance based on the measured distance values includes identifying either: a sequence of modeling states from one user spoken utterance having a shortest average distance from other sequences for all other user spoken utterances in the plurality of user spoken utterances, or a sequence of modeling states from one user spoken utterance having a lowest average distance from other sequences for all other user spoken utterances in the plurality of user spoken utterances; annotating a mapping application with the selected pronunciation of the selected spoken utterance; and providing audio information of the selected user spoken utterance to a user device in response to selection of the location or point of interest by a user.
地址 Mountain View CA US