发明名称 ACQUISITION OF MALICIOUS CODE USING ACTIVE LEARNING
摘要 A method for efficiently detecting unknown malicious code, according to which a Data Set that is a collection of files that includes a first subset with malicious code and a second subset with benign code files is created and malicious and benign files are identified by an antivirus program. All files are parsed using n-gram moving windows of several lengths. The TF representation is computed for each n-gram in each file and an initial set of top features of all n-grams is selected, based on the DF measure. The number of the top features is reduced to comply with the computation resources required for classifier training, by using features selection methods. The optimal number of features is determined, based on the evaluation of the detection accuracy of several sets of reduced top features and a dataset with a distribution of benign files is greater than the distribution of and malicious files is prepared, where a portion of the dataset is used for training the classifier. New malicious codes within a stream of new files are automatically detected and acquired by using Active Learning.
申请公布号 IL195081(D0) 申请公布日期 2011.08.01
申请号 IL20080195081 申请日期 2008.11.03
申请人 DEUTCHE TELEKOM AG;BEN GURION UNIVERSITY OF THE NEGEV RESEARCH AND DEVELOPMENT AUTHORITY 发明人
分类号 G06F21/56 主分类号 G06F21/56
代理机构 代理人
主权项
地址