发明名称 Detecting electronic messaging threats by using metric trees and similarity hashes
摘要 Each node of a metric tree comprises a similarity hash of a member of a dataset of known message threats, calculated using a given similarity hashing algorithm. The nodes are organized into the tree, positioned such that the differences between the similarity hashes are represented as distances between the nodes. Messages are received and tested to determine whether they are malicious. When a message is received, a similarity hash of the message is calculated using the same similarity hashing algorithm that is used to calculate the hashes of the members of the dataset. The tree is searched for a hash of a known message threat that is within a threshold of distance to the hash of the received message. Searching the tree can take the form of traversal from the root node, to determine whether the tree contains a node within the similarity threshold.
申请公布号 US9565209(B1) 申请公布日期 2017.02.07
申请号 US201514675545 申请日期 2015.03.31
申请人 Symantec Corporation 发明人 Grzonkowski Slawomir;Lopez Alejandro Mosquera;Morss Dylan;Aouad Lamine
分类号 H04L29/06;H04L12/58;G06F17/30;G06F21/56 主分类号 H04L29/06
代理机构 Patent Law Works LLP 代理人 Patent Law Works LLP
主权项 1. A computer implemented method for detecting electronic messaging threats by using metric trees and similarity hashes, the method comprising: maintaining, by a computer, a metric tree comprising a plurality of nodes, each node comprising a similarity hash value of a member of a dataset of known electronic message threats, each similarity hash value calculated using a specific similarity hashing algorithm, wherein the plurality of nodes are organized as the metric tree with differences between the similarity hash values represented as distances between the nodes; receiving, by the computer, an electronic message, wherein a status of the received electronic message as being benign or malicious is not known; calculating a similarity hash value of the received electronic message, by the computer, wherein the similarity hash value of the received electronic message is calculated using the specific similarity hashing algorithm that is used to calculate the similarity hash values of the members of the dataset of known electronic message threats; searching the metric tree, by the computer, for a similarity hash value of a known electronic message threat within a predetermined threshold of distance to the similarity hash value of the received electronic message: comprising: traversing the metric tree starting at its root node, to determine whether there is a specific node within the metric tree that has a distance of no more than the predetermined threshold to the similarity hash value of the received electronic message, comprising: until 1) a specific node within the metric tree that has a distance of no more than the predetermined threshold to the similarity hash value of the received electronic message is found, or 2) there are no further branches of the metric tree to be traversed: calculating a distance between a compare node and the similarity hash value of the received electronic message, wherein on a first pass the compare node comprises the root node of the metric tree; determining whether to traverse a left branch of the compare node, by calculating a remainder from subtracting the predetermined distance threshold from the distance between the compare node and the similarity hash value of the received electronic message, and only in response to the remainder being less than or equal to an edit distance to the compare node, determining to traverse the left branch of the compare node; determining whether to traverse a right branch of the compare node, by calculating a sum from adding the predetermined distance threshold to the distance between the compare node and the similarity hash value of the received electronic message, and only in response to the sum being greater than or equal to an edit distance to the compare node, determining to traverse the right branch of the compare node; and setting the compare node to a root node of a branch to be traversed; and responsive to results of the searching, adjudicating the received electronic message as being benign or malicious, by the computer.
地址 Mountain View CA US