摘要 |
A plurality of data files is received. Thereafter, each file is represented as an entropy time series that reflects an amount of entropy across locations in code for such file. A wavelet transform is applied, for each file, to the corresponding entropy time series to generate an energy spectrum characterizing, for the file, an amount of entropic energy at multiple scales of code resolution. It can then be determined, for each file, whether or not the file is likely to be malicious based on the energy spectrum. Related apparatus, systems, techniques and articles are also described. |
主权项 |
1. A method implemented by at least one data processor forming at least part of a computing system, the method comprising:
receiving, by the at least one data processor, a plurality of machine-readable data files; analyzing, by the at least one data processor, each data file to obtain characters contained in the plurality of data files, the characters split into a plurality of non-overlapping file chunks of fixed length; representing, by the at least one data processor, each file as an entropy time series that reflects an amount of entropy across the plurality of non-overlapping fixed-length file chunks for each file; applying, by the at least one data processor, for each file, a wavelet transform to the corresponding entropy time series to generate an energy spectrum characterizing, for the file, an amount of entropic energy at multiple scales of code resolution, the wavelet transform is applied based on at least a coefficient representing a difference of mean entropy levels between the adjacent plurality of non-overlapping fixed-length file chunks in each of the plurality of data files; and determining, by the at least one data processor, for each file, whether or not the file is likely to be malicious based on the energy spectrum, wherein at least one of the files determined to be likely malicious comprises encrypted and/or compressed segments concealing malicious commands. |