发明名称 Method, system and medium for character conversion between different regional versions of a language especially between simplified chinese and traditional chinese
摘要 Method, system and medium for character converting between different regional versions of a language especially between Simplified Chinese and Traditional Chinese are provided. The method comprises finding for the source character a target character, for example by finding the target character in a desired data resource from the plurality of data resources which are managed by a multiple category management model with regard to data resources' priorities. The method may offer users greater flexibility in choosing the data resources most appropriate to their conversion purposes to increase the efficiency and accuracy of the conversion, and meanwhile does not have to search all the data resources before offering a conversion candidate in each operation, thereby shortening the running time of conversion.
申请公布号 US9311302(B2) 申请公布日期 2016.04.12
申请号 US201213527314 申请日期 2012.06.19
申请人 CITY UNIVERSITY OF HONG KONG 发明人 Zhu Chunshen;Hao Tianyong
分类号 G06F17/28 主分类号 G06F17/28
代理机构 Amster, Rothstein & Ebenstein LLP 代理人 Amster, Rothstein & Ebenstein LLP
主权项 1. A computer-implemented method for character conversion between different regional versions of a language, the method comprising: receiving, by a computer system, an input document comprising a plurality of source characters in a source regional version of a language, wherein the computer system comprises a computer processor and a tangible memory that stores instructions for controlling the computer processor, and wherein the computer system is in communication with a plurality of data resources that stores data items regarding the source regional version of the language and other regional versions of the language; finding for each of the source characters, by the computer system, a target character in a target regional version of the language, from the plurality of data resources which are managed by the computer system using multiple categories with their priorities comprising at least one of two kinds of priorities: a relevance priority and an authority priority; performing for all of the source characters of the input document, by the computer system, a conversion from the source regional version of the language to the target regional version of the language based on data items in a desired data resource of the plurality of data resources; and outputting, by the computer system, a target document comprising the converted target characters; wherein the input document comprises one or combination of a character, a word, a sentence, and m sentences; and the word consists of two or more characters, the sentence consists of two or more words; m is an integer greater than or equal to 2, and wherein the source regional version of the language is one of simplified version and traditional version of Chinese character, the target regional version of the language is the other of the simplified version and traditional version of Chinese character, the data resources comprise authoritative publications and informal on-line resources, the categories comprise any combination of at least personal category, regional terms category, words category, one-to-many characters category, and one-to-one characters category, the categories are indicated by priority items which indicate levels of at least the two kinds of priorities, and finding the target character for the source character and performing the conversion between Simplified Chinese and Traditional Chinese comprises: converting one-to-one characters and tag one-to-many characters by reverse maximum matching over all the data resources according to the priories; andmatching and converting one-to-many characters using both priority probability and N-Gram-based probability.
地址 CN