发明名称 Using hash signatures of DOM objects to identify website similarity
摘要 Embodiments are directed to using a hash signature of a rendered DOM object of a website to find similar content and behavior on other websites. Embodiments break a DOM into a large number of data portions (i.e., “shingles”), apply a hashing algorithm to the shingles, select a predetermined number of hashes from the hashed shingles according to a selection criteria to create a hash signature, and compare the hash signature to that of a reference page to determine similarity of website DOM object content. Embodiments can be used to identify phishing websites, defaced websites, spam websites, significant changes in the content of a webpage, copyright infringement, and any other suitable purposes related to the similarity between website DOM object content.
申请公布号 US9386037(B1) 申请公布日期 2016.07.05
申请号 US201514938814 申请日期 2015.11.11
申请人 RiskIQ Inc. 发明人 Hunt Adam;Pon David;Kiernan Chris;Adams Ben;Edgeworth Jonas;Manousos Elias;Linn Joseph
分类号 H04L9/32;H04L29/06;G06F17/30 主分类号 H04L9/32
代理机构 Kilpatrick Townsend & Stockton LLP 代理人 Kilpatrick Townsend & Stockton LLP
主权项 1. A method for determining a similarity between two websites, the method comprising, at a computer system: receiving website information from a web server corresponding to a website; rendering a document object model (DOM) object of the website using the website information; separating content within the DOM object into a plurality of data portions, each of the plurality of data portions having a fixed length; generating, by a hardware processor of the computer system, a hash signature of the DOM object by: applying a predetermined number of hashing functions to each of the plurality of data portions, wherein the predetermined number of hashing functions are generated using a common seed value, and wherein applying the predetermined number of hashing functions results in a predetermined number of values for each of the plurality of data portions; andselecting, using a selection policy, a predetermined number of hashed data portions of the plurality of hashed data portions, wherein the predetermined number of hashed data portions are selected to create a hash signature of the DOM object; comparing the hash signature of the DOM object to a known hash signature of a DOM object associated with a known website having a first classification, wherein comparing the hash signature of the DOM object to the known hash signature of the DOM object associated with the known website includes comparing each of the plurality of hashed data portions to a plurality of known hashed data portions of the known hash signature; calculating a similarity measurement between the hash signature of the DOM object and the known hash signature of the DOM object associated with the known website; comparing the similarity measurement to a threshold; and determining that the website has the first classification based on the similarity measurement exceeding the threshold.
地址 San Francisco CA US