发明名称 DEVICE FOR SIMULTANEOUSLY SEGMENTING BILINGUAL CORPUS, AND COMPUTER PROGRAM THEREFOR
摘要 <P>PROBLEM TO BE SOLVED: To provide a device for simultaneously segmenting a source of a token and a target sequence without any problem of overlearning. <P>SOLUTION: The device includes a storage part which stores first and second sequences; a simultaneous segmentation device which simultaneously segments a block pair of the first and second sequences; a counter 74 which counts how many phrase pairs are generated; a sample extractor 88 which samples block pairs at random; a subtractor 100 which subtracts one from the number of phrase pairs in the sampled block pairs; a calculator 102 which calculates a probability of all simultaneous segmentations which are possible for the sampled block pairs; a sample extractor 106 which samples one of the possible simultaneous segmentations in accordance with the calculated probability; an update part 108 which updates the count of the phrase pairs; and a repetition control part 90 which makes the sample extractor 88 to the update part 108 repeatedly operate until an end condition is met. <P>COPYRIGHT: (C)2012,JPO&INPIT
申请公布号 JP2012093808(A) 申请公布日期 2012.05.17
申请号 JP20100238098 申请日期 2010.10.25
申请人 NATIONAL INSTITUTE OF INFORMATION & COMMUNICATIONTECHNOLOGY 发明人 ANDREW FINCH;SUMIDA EIICHIRO
分类号 G06F17/28;G06F17/27 主分类号 G06F17/28
代理机构 代理人
主权项
地址