发明名称 Fraudulent page detection
摘要 A method of determining whether a page is a fraudulent page comprising the steps of: extracting a plurality of tokens from the page, (403) for each token, calculating a token probability being the probability of the token being in a fraudulent page (404), using the calculated token probabilities, calculating a page probability being the probability of the page being a fraudulent page (405), wherein the token probability of a token being in a fraudulent page is calculated based on a number of fraudulent pages and a number of non-fraudulent pages which contain the token from a training corpus of fraudulent pages and non-fraudulent pages.
申请公布号 US8806622(B2) 申请公布日期 2014.08.12
申请号 US200912989006 申请日期 2009.04.21
申请人 SentryBay Limited 发明人 Waterson David Lynch;Collins Bevan J;Vijapurapu Raghuram;Whittington Marcus Andrew
分类号 G06F7/04;G06F17/30;H04L29/06;H04L12/58;G06Q30/02 主分类号 G06F7/04
代理机构 Arnold, Knobloch & Saunders, L.L.P. 代理人 Arnold, Knobloch & Saunders, L.L.P. ;Knobloch Charles
主权项 1. A method of determining whether a page is a fraudulent page mimicking a page of a known genuine website, comprising the steps of: on a computer: receiving at least one token database corresponding to a genuine website, of an entity wanting to protect customers from fraudulent websites that mimic the genuine website, the token database being custom generated in respect of the genuine website by or on behalf of the entity to contain data relating to tokens that occur in pages of the genuine website and/or likely to occur in fraudulent pages of one or more fraudulent websites mimicking the genuine website, obtaining a page from a website purporting to be the genuine website, extracting a plurality of tokens from the page, for each token, determining a token probability from data in the token database corresponding to the genuine website, the token probability being a probability indicative of the token being in a page of the genuine website, wherein the token probability of a token being in a page from the genuine website is based on a number of fraudulent pages mimicking pages of the genuine website and a number of non-fraudulent pages corresponding to the genuine website and/or other genuine websites that contain the token, said fraudulent pages and non-fraudulent pages being in a training corpus provided by or on behalf of the entity, and wherein using token probabilities determined for each token, calculating a page probability indicating the similarity of the page to a non-fraudulent page corresponding to the genuine website, and if the page probability achieves or exceeds a similarity threshold and the URL of the website is not that of the genuine website, determining that the page is a fraudulent page mimicking a page of the genuine website.
地址 GB