发明名称 METHOD AND DEVICE FOR EXPANDING DATA OF BILINGUAL CORPUS, AND STORAGE MEDIUM
摘要 Disclosed are a method and a device for expanding data of a bilingual corpus. The method for expanding data of a bilingual corpus includes: searching, in a source language-pivot language corpus, for at least one first pivot language phrase semantically matching a first source language phrase; searching, in the source language-pivot language corpus, for at least one second source language phrase semantically matching each of the first pivot language phrases to form a source language phrase set by the second source language phrases; searching, in a pivot language-target language corpus, for at least one first target language phrase semantically matching each of the first pivot language phrases to form a target language phrase set by the first target language phrases; combining the second source language phrases in the source language phrase set with the first target language phrases in the target language phrase set, so as to form at least one phrase pair in which a source language phrase and a target language phrase semantically match; and storing the formed at least one phrase pair in which the source language phrase and the target language phrase semantically match into a source language-target language corpus. Data in a bilingual corpus is expanded, so that the problem of data sparseness in the bilingual corpus is solved.
申请公布号 US2016239481(A1) 申请公布日期 2016.08.18
申请号 US201414892933 申请日期 2014.09.04
申请人 BEIJING BAIDU NETCOM SCIENCE AND TECHNOLOGY CO., LTD. 发明人 Zhu Xiaoning;He Zhongjun;Wu Hua;Wang Haifeng
分类号 G06F17/27;G06F17/30;G06F17/28 主分类号 G06F17/27
代理机构 代理人
主权项 1. A method for expanding data of a bilingual corpus, comprising: searching, in a source language-pivot language corpus, for at least one first pivot language phrase semantically matching a first source language phrase; searching, in the source language-pivot language corpus, for at least one second source language phrase semantically matching each of the first pivot language phrases to form a source language phrase set by the second source language phrases; searching, in a pivot language-target language corpus, for at least one first target language phrase semantically matching each of the first pivot language phrases to form a target language phrase set by the first target language phrases; combining the second source language phrases in the source language phrase set with the first target language phrases in the target language phrase set, so as to form at least one phrase pair in which a source language phrase and a target language phrase semantically match; and storing the formed at least one phrase pair in which the source language phrase and the target language phrase semantically match into a source language-target language corpus.
地址 Beijing CN