主权项 |
1. A method comprising:
training a statistical machine translation model which outputs a score for a candidate translation, in a target language, of a text string in a source language, the training comprising:
learning a weight for each of a set of lexical coverage features that are aggregated in the statistical machine translation model, the lexical coverage features comprising a lexical coverage feature for each of a plurality of parallel corpora, each of the lexical coverage features representing a relative number of words contributed by a respective one of the parallel corpora to the translation of the text string, the lexical coverage features being computed based on membership statistics which represent the membership, in each of the plurality of parallel corpora, of each biphrase used in generating the candidate translation, each parallel corpus corresponding to a respective domain from a set of domains and comprising pairs of text strings, each pair comprising a source text string in the source language and a target text string in the target language; and using the trained model in a statistical machine translation system for translation of a new source text string in the source language, wherein the training is performed with a computer processor. |