摘要 |
A method and system is disclosed for associating an appropriate web crawl interval with a document so that the probability of the document's stale content being used by a search engine is below an acceptable level when the search engine crawls the document at its associated web crawl interval. The web crawl interval of a document is determined through an iterative process and updated dynamically by the search engine after every visit to the document by a web crawler. A multi-tier data structure is employed for managing the web crawl order of billions of documents on the Internet. The search engine may move a document from one tier to another if its web crawl interval is changed significantly.
|