主权项 |
1. A computer implemented method for detecting electronic messaging threats by using metric trees and similarity hashes, the method comprising: maintaining, by a computer, a metric tree comprising a plurality of nodes, each node comprising a similarity hash value of a member of a dataset of known electronic message threats, each similarity hash value calculated using a specific similarity hashing algorithm, wherein the plurality of nodes are organized as the metric tree with differences between the similarity hash values represented as distances between the nodes; receiving, by the computer, an electronic message, wherein a status of the received electronic message as being benign or malicious is not known; calculating a similarity hash value of the received electronic message, by the computer, wherein the similarity hash value of the received electronic message is calculated using the specific similarity hashing algorithm that is used to calculate the similarity hash values of the members of the dataset of known electronic message threats; searching the metric tree, by the computer, for a similarity hash value of a known electronic message threat within a predetermined threshold of distance to the similarity hash value of the received electronic message: comprising: traversing the metric tree starting at its root node, to determine whether there is a specific node within the metric tree that has a distance of no more than the predetermined threshold to the similarity hash value of the received electronic message, comprising: until 1) a specific node within the metric tree that has a distance of no more than the predetermined threshold to the similarity hash value of the received electronic message is found, or 2) there are no further branches of the metric tree to be traversed: calculating a distance between a compare node and the similarity hash value of the received electronic message, wherein on a first pass the compare node comprises the root node of the metric tree; determining whether to traverse a left branch of the compare node, by calculating a remainder from subtracting the predetermined distance threshold from the distance between the compare node and the similarity hash value of the received electronic message, and only in response to the remainder being less than or equal to an edit distance to the compare node, determining to traverse the left branch of the compare node; determining whether to traverse a right branch of the compare node, by calculating a sum from adding the predetermined distance threshold to the distance between the compare node and the similarity hash value of the received electronic message, and only in response to the sum being greater than or equal to an edit distance to the compare node, determining to traverse the right branch of the compare node; and setting the compare node to a root node of a branch to be traversed; and responsive to results of the searching, adjudicating the received electronic message as being benign or malicious, by the computer. |