发明名称 |
Method, system and computer-readable storage medium for detecting trap of web-based perpetual calendar and building retrieval database using the same |
摘要 |
The present disclosure relates to a method, system and software executable by a processor associated with non-transitory computer-readable storage medium for detecting a trap of web-based calendar pages and building a retrieval database. According to an aspect of the disclosure, detecting a trap of web-based calendar pages includes clustering, by a clustering module, URLs corresponding to web pages stored in a database according to a predetermined standard, generating a regular expression by analyzing a date pattern included in a clustering result, and detecting, a cluster suspected of being a trap of web-based perpetual calendar pages using the generated regular expression. |
申请公布号 |
US9141697(B2) |
申请公布日期 |
2015.09.22 |
申请号 |
US201113152017 |
申请日期 |
2011.06.02 |
申请人 |
NHN CORPORATION |
发明人 |
Sim Dong Yun;Lee Chaehyun |
分类号 |
G06F17/30 |
主分类号 |
G06F17/30 |
代理机构 |
Greer Burns & Crain Ltd. |
代理人 |
Greer Burns & Crain Ltd. |
主权项 |
1. A method of detecting a web trap, the method comprising:
regularizing uniform resource locators (URLs) based on a date pattern defined in a select URL; clustering the URLs corresponding to a web page stored in a database according to a predetermined standard based on the regularized URLs; generating a regular expression by analyzing the date pattern associated with the select URL in a clustering result; and detecting at least one unwanted URL being suspected of causing an unwanted number of requests associated with linking dynamic pages of a web-based calendar by using the generated regular expression. |
地址 |
Seongnam-si KR |