发明名称 Method, system and computer-readable storage medium for detecting trap of web-based perpetual calendar and building retrieval database using the same
摘要 The present disclosure relates to a method, system and software executable by a processor associated with non-transitory computer-readable storage medium for detecting a trap of web-based calendar pages and building a retrieval database. According to an aspect of the disclosure, detecting a trap of web-based calendar pages includes clustering, by a clustering module, URLs corresponding to web pages stored in a database according to a predetermined standard, generating a regular expression by analyzing a date pattern included in a clustering result, and detecting, a cluster suspected of being a trap of web-based perpetual calendar pages using the generated regular expression.
申请公布号 US9141697(B2) 申请公布日期 2015.09.22
申请号 US201113152017 申请日期 2011.06.02
申请人 NHN CORPORATION 发明人 Sim Dong Yun;Lee Chaehyun
分类号 G06F17/30 主分类号 G06F17/30
代理机构 Greer Burns & Crain Ltd. 代理人 Greer Burns & Crain Ltd.
主权项 1. A method of detecting a web trap, the method comprising: regularizing uniform resource locators (URLs) based on a date pattern defined in a select URL; clustering the URLs corresponding to a web page stored in a database according to a predetermined standard based on the regularized URLs; generating a regular expression by analyzing the date pattern associated with the select URL in a clustering result; and detecting at least one unwanted URL being suspected of causing an unwanted number of requests associated with linking dynamic pages of a web-based calendar by using the generated regular expression.
地址 Seongnam-si KR