发明名称 System and method for webpage analysis
摘要 A system and method for classifying a webpage may include generating, by an analysis server, a first representation of a webpage. A system and method may include generating, by a unit installed in a user web browser, a second representation of the webpage and the method may comprise producing a classification of the webpage by relating the first representation to the second representation.
申请公布号 US9614862(B2) 申请公布日期 2017.04.04
申请号 US201313949570 申请日期 2013.07.24
申请人 NICE LTD. 发明人 Stern Nir;Cohen Ganor Shlomo;Farkash Rotem
分类号 H04L29/06;G06F12/14;G06F17/22 主分类号 H04L29/06
代理机构 Pearl Cohen Zedek Latzer Baratz LLP 代理人 Pearl Cohen Zedek Latzer Baratz LLP
主权项 1. A method of classifying a webpage, the method comprising: producing, by a hardware analysis server computer, a baseline pool, the baseline pool including data, the data including a plurality of normalized string representations of a respective plurality of webpages served by a web server, wherein producing each of a normalized string representation of the plurality of a normalized string representations of a webpage further comprises: for each webpage of the plurality of webpages: creating an empty output string;for each element of the webpage: i) determine if the element is a known element,ii) if the element is a known element adding the element and its attributes to the empty output string, andiii) if the element is not a known element, refraining from adding the element to the empty output string; andsetting the corresponding normalized string representation of the current webpage to the empty output string;obtaining, from a web browser, a string representation of the webpage received by the web browser from the web server; and producing, by the hardware analysis server, a classification of the webpage by i) determining a sequence having a maximum string length from all strings in the baseline pool that are a subsequence of the string representation;ii) selecting a minimal set size of consecutive substrings from the sequence that is required to reconstruct the string representation, andiii) if any of the substrings in the sequence contain an HTML script tag, an HTML input tag, or a URL having different domain names, then classify the webpage as suspect.
地址 Ra'anana IL