发明名称 Compact encoding of multi-lingual translation dictionaries
摘要 A computerized multilingual translation dictionary includes a set of word and phrases for each of the languages it contains, plus a mapping that indicates for each word or phrase in one language what the corresponding translations in the other languages are. The set of words and phrases for each language are divided up among corresponding concept groups based on an abstract pivot language. The words and phrases are encoded as token numbers assigned by a word-number mapper laid out in sequence that can be searched fairly rapidly with a simple linear scan. The complex associations of words and phrases to particular pivot language senses are represented by including a list of pivot-language sense numbers with each word or phrase. The preferred coding of these sense numbers is by means of a bit vector for each word, where each bit corresponds to a particular pivot element in the abstract language, and the bit is ON if the given word is a translation of that pivot element. Then, to determine whether a word in language 1 translates to a word in language 2 only requires a bit-wise intersection of their associated bit-vectors. Each word or phrase is prefixed by its bit-vector token number, so the bit-vector tokens do double duty: they also act as separators between the tokens of one phrase and those of another. A pseudo-Huffman compression scheme is used to reduce the size of the token stream. Because of the frequency skew for the bit-vector tokens, this produces a very compact encoding.
申请公布号 US5523946(A) 申请公布日期 1996.06.04
申请号 US19950435242 申请日期 1995.05.05
申请人 XEROX CORPORATION 发明人 KAPLAN, RONALD M.;MULLINS, ATTY T.
分类号 G06F17/27;G06F17/28;(IPC1-7):G06F17/28 主分类号 G06F17/27
代理机构 代理人
主权项
地址