发明名称 |
Method and device for detecting phishing web page |
摘要 |
The embodiments of the present invention provide a method and a device for detecting a phishing web page. The method includes: judging whether a unique domain name corresponding to a to-be-detected web page exists in a trusted domain name database; if the unique domain name does not exist in the trusted domain name database, determining a similarity between a content characteristic extracted from the to-be-detected web page and a content characteristic of each template file in a template file database; and determining that the to-be-detected web page is a phishing web page if the similarity between the content characteristic extracted from the to-be-detected web page and a content characteristic of at least one template file is greater than a preset similarity threshold. In the embodiments of the present invention, accuracy of a result of detecting a phishing web page is improved. |
申请公布号 |
US9218482(B2) |
申请公布日期 |
2015.12.22 |
申请号 |
US201213689230 |
申请日期 |
2012.11.29 |
申请人 |
Huawei Technologies Co., Ltd. |
发明人 |
Ma Shaobu;Guo Hui |
分类号 |
G06F21/50;G06F17/30;H04L29/06;G06F21/44;H04L29/08;H04L29/12 |
主分类号 |
G06F21/50 |
代理机构 |
Leydig, Voit & Mayer, Ltd. |
代理人 |
Leydig, Voit & Mayer, Ltd. |
主权项 |
1. A method for detecting a phishing web page, comprising:
judging whether a unique domain name corresponding to a to-be-detected web page exists in a trusted domain name database; when the unique domain name does not exist in the trusted domain name database, determining a similarity between a content characteristic extracted from the to-be-detected web page and a content characteristic of each template file in a template file database comprising a phishing template database or a brand template database, wherein the content characteristic comprises at least: a coding format, a document object model, a word, and a number of words; and determining that the to-be-detected web page is the phishing web page when the similarity between the content characteristic extracted from the to-be-detected web page and a content characteristic of at least one template file is greater than a preset similarity threshold; wherein determining that the to-be-detected web page is the phishing web page further comprises the following: (a) reading a template file from the template file database, and judging whether a coding format extracted from the to-be-detected web page is the same as a coding format in the template file; (b) when the coding format extracted from the to-be-detected web page is the same as the coding format in the template file, judging whether an absolute value of a difference between the number of words extracted from the to-be-detected web page and the number of words in the template file falls within a preset range of a number similarity; (c) when the number of words falls within the preset range of the number similarity, determining whether a word similarity between the words extracted from the to-be-detected web page and the words in the template file falls between a high preset value of the word similarity and a low preset value of the word similarity; (d) when the word similarity falls between the high preset value of the word similarity and the low preset value of the word similarity, calculating a model similarity between a document object model extracted from the to-be-detected web page and a document object model in the template file; and (e) when the model similarity is greater than a preset value of the model similarity or the word similarity is higher than the high preset value of the word similarity, determining that the to-be-detected web page is the phishing web page; reading a next template file from the phishing template database or the brand template database, and repeating the above steps of (a) to (d) until a most similar template file is found according to the model similarity among multiple template files whose model similarity reaches the preset value of the model similarity; wherein (b) is performed after (a), (c) is performed after (b), (d) is performed after (c) and (e) is performed after (d). |
地址 |
Shenzhen CN |