发明名称 PARAMETER ADJUSTMENT METHOD USED FOR STATISTICAL MACHINE TRANSLATION
摘要 The present invention relates to a parameter adjustment method. The parameter adjustment method used for statistical machine translation comprises: step 1: using a monolingual corpus to construct a language model required for translation; step 2: using a bilingual parallel corpus to construct a translation model; and step 3: using a target function to adjust a parameter. The method can solve problems in prior art that the parameter is easily over-fit or falls into a local optimum during a parameter adjustment process, and the method is easy to implement and can combine a larger number of features. In addition, the target function is convex; therefore, global optimum can be achieved in a training process.
申请公布号 US2016004691(A1) 申请公布日期 2016.01.07
申请号 US201314763505 申请日期 2013.12.02
申请人 HARBIN INSTITUTE OF TECHNOLOGY 发明人 CAO Hailong;ZHANG Wenwen;LIU Lemao;ZHAO Tiejun;YANG Muyun;ZHENG Dequan;ZHU Conghui;XU Bing
分类号 G06F17/28 主分类号 G06F17/28
代理机构 代理人
主权项 1. A parameters adjustment method in a statistical machine translation, characterized in that, the method comprises the following steps: Step 1: building a language model required for translation by utilizing a monolingual corpora; Step 2: building a phrase translation model by utilizing a bilingual parallel corpus; Step 3: Step 3: processing parameters adjustment for λm by utilizing an objective functionminλ∑s=1n[-Σm=1Mλmhm(es,fs)+log∑e∈Csexp{Σm=1Mλmhm(fs,e′)+l(e′,es)}],where es refers to reference translation, e′ refers to machine translation, fs refers to sentence in source language awaiting for translation processing, hm (es,fs) and hm (fs,e′) refer to the characteristics used in building the translation system, the characteristics comprises four main categories, which are language model, phrase translation listing, sequence model and correctional word penalty items, m=1, . . . , M, M refers to the total number of characteristics, l(e′, es) refers to cost function, C5 refers to the collection set of machine translation candidate, e′ ε C5.
地址 Harbin, Heilongjiang CN