发明名称 Classifying data using fingerprint of character encoding
摘要 A method is disclosed for classifying data according to in which character encoding it has been encoded. A fingerprint (62) is constructed from the data, wherein the fingerprint comprises, for each of a plurality of predetermined character encoding schemes, at least one confidence value, representing a confidence that the data was encoded using said character encoding scheme. The fingerprint also comprises a frequency value for each of a subset of byte values, each frequency value representing the frequency of occurrence of a respective byte value in the data. A statistical classification of the data is then performed based on the fingerprint. The method then preferably identifies a language represented by textual data in the classifying data and applies a language-specific policy based on the identified language.
申请公布号 GB2489512(A) 申请公布日期 2012.10.03
申请号 GB20110005509 申请日期 2011.03.31
申请人 CLEARSWIFT LIMITED 发明人 KEVIN SCHOFIELD;ISTVAN BIRO
分类号 G06F17/22 主分类号 G06F17/22
代理机构 代理人
主权项
地址