发明名称 CORPUS CAPACITY REDUCTION DEVICE, METHOD AND COMPUTER PROGRAM THEREFOR
摘要 <P>PROBLEM TO BE SOLVED: To reduce capacity of the existing corpus and to prevent deterioration of perplexity of a language model. <P>SOLUTION: This corpus capacity reduction device 20 includes a candidate sentence selection part 40 which sequentially selects sentences included in a first corpus 22 as candidate sentences for adding to a second corpus 24, a three sentence set selection part 44 which selects all sentence sets consisting of combination of three sentences included in the second corpus for each of the selected candidate sentences, an analogical relation decision part 46 which decides whether or not analogical relation is established between each of the selected candidate sentences and any of the combination of three sentences selected by a three sentence set selection part 44 and a candidate document writing part 42 which adds a candidate sentence decided that no analogical relation is established by the analogical relation decision part 46 to the second corpus 24 for each of the candidate sentences. <P>COPYRIGHT: (C)2005,JPO&NCIPI
申请公布号 JP2005251101(A) 申请公布日期 2005.09.15
申请号 JP20040064287 申请日期 2004.03.08
申请人 ADVANCED TELECOMMUNICATION RESEARCH INSTITUTE INTERNATIONAL 发明人 YVES LEPAGE;ETIENNE DENOUAL
分类号 G06K9/72;G06F17/28;G10L15/06;G10L15/18 主分类号 G06K9/72
代理机构 代理人
主权项
地址