Classifying data using fingerprint of character encoding,申请号GB20110005509-传众专利搜索

发明名称	Classifying data using fingerprint of character encoding
摘要	A method is disclosed for classifying data according to in which character encoding it has been encoded. A fingerprint (62) is constructed from the data, wherein the fingerprint comprises, for each of a plurality of predetermined character encoding schemes, at least one confidence value, representing a confidence that the data was encoded using said character encoding scheme. The fingerprint also comprises a frequency value for each of a subset of byte values, each frequency value representing the frequency of occurrence of a respective byte value in the data. A statistical classification of the data is then performed based on the fingerprint. The method then preferably identifies a language represented by textual data in the classifying data and applies a language-specific policy based on the identified language.
申请公布号	GB2489512(A)	申请公布日期	2012.10.03
申请号	GB20110005509	申请日期	2011.03.31
申请人	CLEARSWIFT LIMITED	发明人	KEVIN SCHOFIELD;ISTVAN BIRO
分类号	G06F17/22	主分类号	G06F17/22
代理机构		代理人
主权项
地址