发明名称 |
METHOD AND DEVICE FOR GENERATING TRAINING DATA FOR TRAINING STATISTICAL MACHINE TRANSLATION DEVICE, PARAPHRASE DEVICE, METHOD FOR TRAINING THE SAME, AND DATA PROCESSING SYSTEM AND COMPUTER PROGRAM FOR THE METHOD |
摘要 |
<P>PROBLEM TO BE SOLVED: To provide a method for shortening a sentence without omission of information. <P>SOLUTION: The method for generating training data for training statistical machine translation 28 is provided with a step for preparing a corpus including a plurality of sentences of a prescribed language, a step for clustering a similar sentence in the corpus 12 into a plurality of clusters 16, a step 18 for selecting the cluster of a particle size, which is selected from a plurality of the clusters 16, a step 18 for selecting one sentence in a length satisfying prescribed standard in the respective clusters of the selected particle size, and a step 18 for making each of sentences and one selected sentence into a pair in the respective clusters of the selected particle size. <P>COPYRIGHT: (C)2004,JPO&NCIPI |
申请公布号 |
JP2004252495(A) |
申请公布日期 |
2004.09.09 |
申请号 |
JP20020272481 |
申请日期 |
2002.09.19 |
申请人 |
ADVANCED TELECOMMUNICATION RESEARCH INSTITUTE INTERNATIONAL |
发明人 |
ANDREW FINCH;WATANABE TARO;SUMIDA EIICHIRO |
分类号 |
G06F17/28 |
主分类号 |
G06F17/28 |
代理机构 |
|
代理人 |
|
主权项 |
|
地址 |
|