发明名称 METHOD AND DEVICE FOR GENERATING TRAINING DATA FOR TRAINING STATISTICAL MACHINE TRANSLATION DEVICE, PARAPHRASE DEVICE, METHOD FOR TRAINING THE SAME, AND DATA PROCESSING SYSTEM AND COMPUTER PROGRAM FOR THE METHOD
摘要 <P>PROBLEM TO BE SOLVED: To provide a method for shortening a sentence without omission of information. <P>SOLUTION: The method for generating training data for training statistical machine translation 28 is provided with a step for preparing a corpus including a plurality of sentences of a prescribed language, a step for clustering a similar sentence in the corpus 12 into a plurality of clusters 16, a step 18 for selecting the cluster of a particle size, which is selected from a plurality of the clusters 16, a step 18 for selecting one sentence in a length satisfying prescribed standard in the respective clusters of the selected particle size, and a step 18 for making each of sentences and one selected sentence into a pair in the respective clusters of the selected particle size. <P>COPYRIGHT: (C)2004,JPO&NCIPI
申请公布号 JP2004252495(A) 申请公布日期 2004.09.09
申请号 JP20020272481 申请日期 2002.09.19
申请人 ADVANCED TELECOMMUNICATION RESEARCH INSTITUTE INTERNATIONAL 发明人 ANDREW FINCH;WATANABE TARO;SUMIDA EIICHIRO
分类号 G06F17/28 主分类号 G06F17/28
代理机构 代理人
主权项
地址