发明名称 AUTOMATIC IDENTIFICATION METHOD FOR KEY LANGUAGE OF SAMPLE TEXT
摘要 <p>PROBLEM TO BE SOLVED: To provide a new automatic language identification method which uses the short words and an N-gram technique. SOLUTION: Text data 10 define a sample text having a basic language. A set 14 represents a total set of natural languages whose probability is indicated by probability data 12 which include a 1st partial set of languages including N-gram probability data and a 2nd partial set of languages including word probability data. The N-gram probability data on each language indicate a rate of the occurrence of the N-gram in the text when the language is the key language. In the same way, the word probability data indicate a rate of the occurrence of words. Then both data 10 and 12 are used to automatically acquire sample probability data 20 which indicate the language sample probability of a 3rd partial set, i.e., an overlapping part of the 1st and 2nd partial sets. The data 20 are used to automatically acquire language identification data 26, and the data 26 identify the language included in the 3rd partial set where the data 20 indicate the highest probability.</p>
申请公布号 JP2000194696(A) 申请公布日期 2000.07.14
申请号 JP19990350916 申请日期 1999.12.10
申请人 XEROX CORP 发明人 SCHULZE BRUNO M
分类号 G06F17/27;G06F17/28;(IPC1-7):G06F17/27 主分类号 G06F17/27
代理机构 代理人
主权项
地址