发明名称 Syntactical Fingerprinting
摘要 A method for identifying phishing websites and illustrating the provenance of each website through the structural components that compose the websites. The method includes identifying newly observed phishing websites and using the method as a distance metric for clustering phishing websites. Varying the threshold value within method demonstrates the potential capability for phishing investigators to identify the source of many phishing websites as well as individual phishers.
申请公布号 US2015067839(A1) 申请公布日期 2015.03.05
申请号 US201214352601 申请日期 2012.07.09
申请人 Wardman Brad;Haddock Walker 发明人 Wardman Brad;Haddock Walker
分类号 H04L29/06;G06F17/30 主分类号 H04L29/06
代理机构 代理人
主权项 1. A method for identifying a phishing website comprising: a. providing a computer system having an operating system, a database system and a communication system for controlling communications through the Internet, b. transmitting a communication containing a plurality of suspected phishing urls to the computer system, c. retrieving website content files for each suspected phishing url of the plurality of phishing urls, the website content files including structural components, d. preprocessing the website content files thereby producing normalized website content file sets for each of the plurality of suspected phishing urls, e. creating an abstract syntax tree for each of the normalized website content file sets, f. calculating a hash value for each structural component of each of the normalized website content file sets and constructing a hash value set there from for each normalized website content file set, g. selecting a first hash value from a first hash value set and comparing the first hash value to hash values of structural components of known phishing websites to locate a matching hash value, h. if a matching hash value is located, comparing the first hash value set to a hash value set of the matching hash value and creating a similarity score, and i. if the similarity score meets or exceeds a predetermined threshold, designating a suspected url from which the first hash value was derived as a phishing website.
地址 Phoenix AZ US