发明名称 Filled Translation for Bootstrapping Language Understanding of Low-Resourced Languages
摘要 Annotated training data (e.g., sentences) in a first language are used to generate annotated training data for a second language. For example, annotated sentences in English are manually collected first, and then is used to generate annotated sentences in Chinese. The annotated training data includes slot labels, slot values and carrier phrases. The carrier phrases are the portions of the training data that is outside of a slot. The carrier phrases are translated from the first language to one or more translations in the second language. The translations may include machine translations as well as human translations. Entities for the slot values are determined for the translated sentences using content sources that include locale-dependent entities. The determined entities are used to fill the slots in the translations of the second language. All or a portion of the resulting sentences may be used for training models in the second language.
申请公布号 US2015127319(A1) 申请公布日期 2015.05.07
申请号 US201314074358 申请日期 2013.11.07
申请人 Microsoft Corporation 发明人 Hwang Mei-Yuh;Ni Yong
分类号 G06F17/28 主分类号 G06F17/28
代理机构 代理人
主权项 1. A method for using training data in a first language to create training data in a second language, comprising: accessing the training data in the first language that include sentences that each comprises zero or more carrier phrases, and zero or more slot labels with slot values; performing slot abstraction on at least a portion of the training data to create abstract sentences that each comprises zero or more carrier phrases, and zero or more abstract tokens that replace the slot labels and the slot values; translating the abstract sentences to the second language; and replacing each of abstract tokens with an locale-dependent entity for the slot type, in order to create the training data in the language.
地址 Redmond WA US