发明名称 METHOD FOR IDENTIFYING LANGUAGE/CHARACTER CODE SYSTEM
摘要 A method for mechanically identifying the language and character code system of a text document encoded by a computer. In the list LBSL/C of byte string of specified length previously formed for each objective language/character code system, byte strings of a specified number of bytes possibly occurring in a text document of a relevant language/character code system are stored. For each language/character code string, an “occurrence rate of learnt byte string” , i.e. the rate of the number of byte strings of specified length already existing in the list LBSL/C and contained in an objective text document, is calculated and only when only one language/character code system having an “occurrence rate of learnt byte” close to 1 exists, the language/character code system is outputted as the result.
申请公布号 WO02095614(A1) 申请公布日期 2002.11.28
申请号 WO2001JP04350 申请日期 2001.05.24
申请人 SUZUKI, IZUMI 发明人 SUZUKI, IZUMI
分类号 G06F17/22;G06F17/27;(IPC1-7):G06F17/21 主分类号 G06F17/22
代理机构 代理人
主权项
地址