发明名称 Similarity search and malware prioritization
摘要 Methods, system, and media for determining similar malware samples are disclosed. Two or more malware samples are received and analyzed to extract information from the two or more malware samples. The extracted information is converted to a plurality of sets of strings. A similarity between the two or more malware samples is determined based on the plurality of the sets of strings.
申请公布号 US9197665(B1) 申请公布日期 2015.11.24
申请号 US201514641503 申请日期 2015.03.09
申请人 发明人 Cabot Charles;Borbely Rebecca A.;West Michael W.;Raugas Mark V.
分类号 G06F11/00;H04L29/06;G06F17/30 主分类号 G06F11/00
代理机构 Fish & Richardson P.C. 代理人 Fish & Richardson P.C.
主权项 1. A computer-implemented method for processing a malware sample executed by one or more computer processors, the method comprising: receiving two or more malware samples; analyzing, by the one or more computer processors, the two or more malware samples to extract information from the two or more malware samples; generating, by the one or more computer processors, at least one set of strings for each of the two or more malware samples using the extracted information, wherein the at least one set of strings includes a first set of strings generated from extracted information corresponding to a first malware sample and a second set of strings generated from extracted information corresponding to a second malware sample; determining, by the one or more computer processors, a similarity between the two or more malware samples based on the at least one set of strings for each of the two or more malware samples, determining the similarity comprising: determining a similarity index associated with the first set of strings and the second set of strings by: determining a union of a first data set associated with the first set of strings and a second data set associated with the second set of strings,determining an intersection of the first data set and the second data set, anddividing the intersection by the union,determining a distance based on the similarity index by subtracting a result of the dividing from one, anddetermining the similarity based on the distance; and providing, for display to a user, an output indicating the similarity between the two or more malware samples.
地址