发明名称 METHOD AND APPARATUS FOR TERMINOLOGY TRANSLATION
摘要 The invention relates to a method and apparatus for generating translations of natural language terms from a first language to a second language. A plurali ty of terms are extracted from unaligned comparable corpora of the first and secon d languages. Comparable corpora are sets of documents in different languages that come from the same domain and have similar genre and content. Unaligned documents are not translations of one another and are not linked in any othe r way. By accessing monolingual thesauri of the first and second languages, a category is assigned to each extracted term. Then, category-to-category translation probabilities are estimated, and using said category-to-category translation probabilities , term-to- term translation probabilities are estimated. The invention preferably exploits class- based normalization of probability estimates, bi-directionality, and relativ e frequency normalization. The most important applications are cross-language text retrieval, semi-automatic bilingual thesaurus enhancement, and machine-aided human translation.
申请公布号 CA2364999(C) 申请公布日期 2005.05.03
申请号 CA20012364999 申请日期 2001.12.10
申请人 XEROX CORPORATION 发明人 HULL, DAVID
分类号 G06F17/27;G06F17/28;(IPC1-7):G06F17/28 主分类号 G06F17/27
代理机构 代理人
主权项
地址