发明名称 |
SAMPLING AND OPTIMIZATION IN PHRASE-BASED MACHINE TRANSLATION USING AN ENRICHED LANGUAGE MODEL REPRESENTATION |
摘要 |
Rejection sampling is performed to acquire at least one target language translation for a source language string s in accordance with a phrase-based statistical translation model p(x)=p(t, a|s) where t is a candidate translation, a is a candidate alignment comprising a biphrase sequence generating the candidate translation t, and x is a sequence representing the candidate alignment a. The rejection sampling uses a proposal distribution comprising a weighted finite state automaton (WFSA) q(n) that is refined responsive to rejection of a sample x* obtained in a current iteration of the rejection sampling to generate a refined WFSA q(n+1) for use in a next iteration of the rejection sampling. The refined WFSA q(n+1) is selected to satisfy the criteria p(x)≦q(n+1)(x)≦q(n)(x) for all xεX and q(n+1)(x*)<q(n)(x*) where the space X is the set of sequences x corresponding to candidate alignments a that generate candidate translations t for the source language string s. |
申请公布号 |
US2014214397(A1) |
申请公布日期 |
2014.07.31 |
申请号 |
US201313750338 |
申请日期 |
2013.01.25 |
申请人 |
XEROX CORPORATION |
发明人 |
Dymetman Marc;Aziz Wilker Ferreira;Venkatapathy Sriram |
分类号 |
G06F17/28 |
主分类号 |
G06F17/28 |
代理机构 |
|
代理人 |
|
主权项 |
1. A non-transitory storage medium storing instructions executable by an electronic data processing device to perform rejection sampling to acquire at least one accepted target language translation for a source language string s in accordance with a phrase-based statistical translation model
p(x)=p(t,a|s) where t is a candidate translation, a is a candidate alignment comprising a source language-target language biphrase sequence generating the candidate translation t, and x is a sequence representing the candidate alignment a, the rejection sampling using a proposal distribution comprising a weighted finite state automaton (WFSA) q(n) that is refined responsive to rejection of a sample x* obtained in a current iteration of the rejection sampling to generate a refined WFSA q(n+1) for use in a next iteration of the rejection sampling wherein the refined WFSA q(n+1) is selected to satisfy the criteria:
p(x)≦q(n+1)(x)≦q(n)(x) for all xεX; andq(n+1)(x*)<q(n)(x*); where the space X is the set of sequences x corresponding to candidate alignments a that generate candidate translations t for the source language string s. |
地址 |
Norwalk CT US |