摘要 |
<p>A system and method are provided that provide a minimal impact crawler (144) for searching and retrieving information on a distributed network. A policy engine (116) is provided that receives a request for a specific item and assembles policies for the target site containing information about the specific item. The policies are rules that determine the crawl (144) of a target site (128). The crawler (144) applies the policies to schedule crawls (144) of the target site (128) and stores data retrieved from the crawl (144) into a historical database (104) allowing future requests to be satisfied from the data stored in the database. A scheduling engine is implemented to automatically schedule crawls (144) at the beginning of an auction and at the end of an auction to minimize the number of crawls (144) on an auction site. The crawler (144) employs a plurality of minions (144) to retrieve crawl (144) requests and crawl (144) the target web sites (128) to obtain the necessary data.</p> |