发明名称 Automatic cognate detection in a computer-assisted language learning system
摘要 According to an aspect, a first word in a first language and a second word in a second language in a bilingual corpus are stemmed. A probability for aligning the first stem and the second stem and a distance metric between the normalized first stem and the normalized second stem are calculated. The first word and the second word are identified as a cognate pair when the probability and the distance metric meet a threshold criterion and stored as a cognate pair in a set of cognates. A candidate sentence in the second language is retrieved from a corpus. The candidate sentence is filtered by the active vocabulary of a user in the second language and the set of cognates. A sentence quality score is calculated for the candidate sentence; and the candidate sentence is ranked for presentation to the user based on the sentence quality scorer.
申请公布号 US9400781(B1) 申请公布日期 2016.07.26
申请号 US201615018014 申请日期 2016.02.08
申请人 INTERNATIONAL BUSINESS MACHINES CORPORATION 发明人 Navratil Jiri;Roukos Salim;Ward Robert T.
分类号 G06F17/27;G06F17/28 主分类号 G06F17/27
代理机构 Cantor Colburn LLP 代理人 Cantor Colburn LLP ;Goudy Kurt
主权项 1. A computer-implemented method for automatic cognate detection, the method comprising: stemming, by a processor, a first word in a first language in a bilingual corpus to obtain a first stem and a second word in a second language in the bilingual corpus to obtain a second stem; calculating, by the processor, a probability for aligning the first stem and the second stem; normalizing, by the processor, the first stem and the second stem; calculating, by the processor, a distance metric between the normalized first stem and the normalized second stem; identifying, by the processor, the first word and the second word as a cognate pair when the probability and the distance metric meet a threshold criterion; storing the cognate pair in a set of cognates; retrieving, by the processor, a candidate sentence in the second language from a corpus; filtering, by the processor, the candidate sentence by an active vocabulary of a user in the second language and the set of cognates; calculating, by the processor, a sentence quality score for the candidate sentence; ranking, by the processor, the candidate sentence based on the sentence quality score; and presenting the ranked candidate sentence as a pure or combined audio, graphic, textual, or video stimulus to the user.
地址 Armonk NY US