发明名称 Systems and methods to control web scraping
摘要 Systems and methods to control web scraping through a plurality of web servers using real time access statistics are described. For example, in one embodiment a web request is categorized based at least in part on a type of data access. The web request can be processed based on threshold determinations. The processing can include blocking, delaying, timely replying, or prioritizing a response to the web request.
申请公布号 US9385928(B2) 申请公布日期 2016.07.05
申请号 US201314061633 申请日期 2013.10.23
申请人 YellowPages.com LLC 发明人 Petta Damon Layton;Mohs Bradley Keith
分类号 H04L29/06;H04L12/26;G06F17/30;G06F21/55;G06Q30/02;G06Q50/00 主分类号 H04L29/06
代理机构 Alston & Bird LLP 代理人 Alston & Bird LLP
主权项 1. A system comprising: one or more network interfaces accessible from a network; one or more repositories to retain one or more of: web content;requester identification information; and/orweb request history information; one or more processors coupled to at least one of the one or more network interfaces and to at least one of the one or more repositories, the one or more processors to execute instructions to: process a current web request that is received via the network and corresponds to a request for information;determine a request category corresponding to the current web request based at least in part on a type of data access corresponding to the current web request;determine a threshold based on the request category of the current web request, wherein the threshold is from a plurality of thresholds, each of which thresholds corresponds to a respective request category of web requests;determine identification information indicating a requester of the current web request;log the current web request in association with the requester;process information on past web requests of the requester to identify a characteristic of the past web requests;determine whether the requester of the current web request is a web scraper based on whether the characteristic of the past web requests exceeds the threshold;provide types of processing to handle the current web request wherein the provided types of processing include at least blocking, delaying, timely replying, and prioritizing a response to the current web request:determine a type of processing based on the determining whether the requester of the current web request is a web scraper; andhandle the current web request based on the type of processing wherein (a) blocking comprises not responding to the current web request, (b) delaying comprises postponing a response to the current web request for a predetermined period of time, (c) timely replying comprises providing a timely response to the current web request, and (d) prioritizing comprises indicating a priority for a response to the current web request.
地址 Glendale CA US