发明名称 System and method for incrementally updating a reordering model for a statistical machine translation system
摘要 A method for updating a reordering model of a statistical machine translation system includes, at a first time, receiving new training data for retraining an existing statistical machine translation system, the new training data including at least one sentence pair, each pair including a source sentence in a source language and a target sentence in a target language. Phrase pairs are extracted from the new training data and used to generate a new reordering file. A reordering model of the existing statistical machine translation system is updated, based on the new reordering file. The reordering model includes a reordering table. At a second time after the first time, new training data is received. The extracting of phrase pairs, generating of the new reordering file and the updating the reordering model is reiterated, based on the new training data received at the second time.
申请公布号 US9442922(B2) 申请公布日期 2016.09.13
申请号 US201414546424 申请日期 2014.11.18
申请人 XEROX CORPORATION 发明人 Mirkin Shachar
分类号 G06F17/28 主分类号 G06F17/28
代理机构 Fay Sharpe LLP 代理人 Fay Sharpe LLP
主权项 1. A method for updating a reordering model of a statistical machine translation system comprising: at a first time, receiving new training data for retraining an existing statistical machine translation system, the new training data comprising at least one sentence pair, each of the at least one sentence pair comprising a source sentence in a source language and a target sentence in a target language; extracting phrase pairs from the new training data, each phrase pair including a source language phrase and a target language phrase; generating a new reordering file from the extracted phrase pairs, the new reordering file including a set of the phrase pairs extracted from the new training data; updating a reordering model of the existing statistical machine translation system based on the new reordering file, the reordering model including a reordering table, the reordering table comprising phrase pairs and a set of features, the set of features comprising, for each of a set of orientation types, at least one feature which is a function of a count of the orientation type for the respective phrase pair, each phrase pair in the reordering table occurring only once, and wherein the updating of the reordering model includes merging an existing reordering table with the new reordering file or merging the existing reordering table with a new reordering table generated from the new reordering file, the merging including updating feature scores for each of the orientation types for at least some of the phrase pairs based on the counts stored in the existing reordering table; at a second time after the first time, receiving new training data for training the existing statistical machine translation system, the new training data comprising at least one sentence pair, the sentence pair comprising a source sentence in the source language and a target sentence in the target language; and reiterating the extracting of phrase pairs, generating of the new reordering file and the updating the reordering model based on the new training data received at the second time, wherein at least one of the extracting phrase pairs, generating the new reordering file, and updating the reordering model is performed with a computer processor.
地址 Norwalk CT US