发明名称 System and method for statistical analysis of comparative entropy
摘要 In accordance with one embodiment of the present disclosure, a method for determining the similarity between a first data set and a second data set is provided. The method includes performing an entropy analysis on the first and second data sets to produce a first entropy result, wherein the first data set comprises data representative of a first one or more computer files of known content and the second data set comprises data representative of a one or more computer files of unknown content; analyzing the first entropy result; and if the first entropy result is within a predetermined threshold, identifying the second data set as substantially related to the first data set.
申请公布号 US9501640(B2) 申请公布日期 2016.11.22
申请号 US201113232718 申请日期 2011.09.14
申请人 McAfee, Inc. 发明人 Beveridge David Neill;Karnik Abhishek Ajay;Beets Kevin A.;Heppner Tad M.;Raman Karthik
分类号 G06F11/00;G06F21/56;G06F12/14;G06F12/16 主分类号 G06F11/00
代理机构 Baker Botts L.L.P. 代理人 Baker Botts L.L.P.
主权项 1. At least one non-transitory machine readable storage medium, comprising computer-executable instructions carried on the computer readable medium, the instructions readable by a processor, the instructions, when read and executed, for causing the processor to: perform an entropy analysis on a first data set and a second data set to produce a first entropy result, wherein: the first data set comprises data representative of a probability distribution function of token values associated with a first one or more computer files of known content and the second data set comprises data representative of a probability distribution function of token values associated with one or more computer files of unknown content; and the entropy analysis includes causing the processor to: compare token values between the probability distribution function of the computer files of known content and the probability distribution function of the computer files of unknown content;generate the first entropy result based at least upon a difference between an expected number of occurrences of the token values in the probability distribution function of the computer files of known content and an actual number of occurrences of the token values in the probability distribution function of the computer files of unknown content; based on a determination that the first entropy result is within a predetermined threshold, identify the second data set as substantially related to the first data set; based upon identification of the second data set as substantially related to the first data set, identify malware resident on an electronic device.
地址 Santa Clara CA US