发明名称 |
DEVICE FOR SIMULTANEOUSLY SEGMENTING BILINGUAL CORPUS, AND COMPUTER PROGRAM THEREFOR |
摘要 |
<P>PROBLEM TO BE SOLVED: To provide a device for simultaneously segmenting a source of a token and a target sequence without any problem of overlearning. <P>SOLUTION: The device includes a storage part which stores first and second sequences; a simultaneous segmentation device which simultaneously segments a block pair of the first and second sequences; a counter 74 which counts how many phrase pairs are generated; a sample extractor 88 which samples block pairs at random; a subtractor 100 which subtracts one from the number of phrase pairs in the sampled block pairs; a calculator 102 which calculates a probability of all simultaneous segmentations which are possible for the sampled block pairs; a sample extractor 106 which samples one of the possible simultaneous segmentations in accordance with the calculated probability; an update part 108 which updates the count of the phrase pairs; and a repetition control part 90 which makes the sample extractor 88 to the update part 108 repeatedly operate until an end condition is met. <P>COPYRIGHT: (C)2012,JPO&INPIT |
申请公布号 |
JP2012093808(A) |
申请公布日期 |
2012.05.17 |
申请号 |
JP20100238098 |
申请日期 |
2010.10.25 |
申请人 |
NATIONAL INSTITUTE OF INFORMATION & COMMUNICATIONTECHNOLOGY |
发明人 |
ANDREW FINCH;SUMIDA EIICHIRO |
分类号 |
G06F17/28;G06F17/27 |
主分类号 |
G06F17/28 |
代理机构 |
|
代理人 |
|
主权项 |
|
地址 |
|