主权项 |
1. A method for crawling a web site, comprising:
receiving, at a server device, log data from a plurality of web browsers, the log data indicating users accessing the web site through the web browsers; using, at the service device, the log data to estimate traffic to the web site during a timeframe; determining, by the server device, a threshold frequency of page requests for the web site during the timeframe based on the estimate of traffic; determining, at the server device, a crawl rate during the timeframe that is less than the threshold frequency of page requests; and using the crawl rate to schedule one or more web crawlers to request the web site. |