发明名称 |
Minimizing visibility of stale content in web searching including revising web crawl intervals of documents |
摘要 |
A method includes comparing a first instance with a second instance of a document in a plurality of documents. The first instance is obtained from a remote location at a specified time before the second instance is obtained from the remote location, and (i) the specified time is determined in accordance with a first crawl interval associated with the document, (ii) each document in the plurality of documents is assigned to a tier in a plurality of tiers, each tier having a distinct associated range of web crawl intervals, and (iii) the first crawl interval is assigned a first tier. The method also includes computing a second crawl interval for the document, which is a function of the document comparison; and determining whether the second crawl interval is in the first tier. When the second crawl interval is not, the first document is reassigned to another tier. |
申请公布号 |
US8782032(B2) |
申请公布日期 |
2014.07.15 |
申请号 |
US201313849355 |
申请日期 |
2013.03.22 |
申请人 |
Google Inc. |
发明人 |
Carver Anton P. T. |
分类号 |
G06F17/30 |
主分类号 |
G06F17/30 |
代理机构 |
Morgan, Lewis & Bockius LLP |
代理人 |
Morgan, Lewis & Bockius LLP |
主权项 |
1. A method for scheduling a document crawl interval, comprising:
at a computer system having one or more processors and a memory storing one or more programs for execution by the one or more processors: comparing a first instance of a document in a plurality of documents with a second instance of the document, thereby obtaining a document comparison, wherein the first instance of the document is obtained from a remote location at a specified time before the second instance of the document is obtained from the remote location and wherein
(i) the specified time is determined in accordance with a first crawl interval associated with the document,(ii) each document in the plurality of documents is assigned to a crawl-scheduling tier in a plurality of crawl-scheduling tiers, each crawl-scheduling tier in the plurality of crawl-scheduling tiers having a distinct associated range of web crawl intervals, and(iii) the first crawl interval is assigned a first crawl-scheduling tier in the plurality of crawl-scheduling tiers; and computing a second crawl interval for the document, wherein the second crawl interval is a function of the document comparison; and determining whether the second crawl interval is in the crawl-scheduling first tier, wherein, when the second crawl interval is not in the crawl-scheduling first tier, the first document is reassigned to a crawl-scheduling tier in the plurality of crawl-scheduling tiers other than the first crawl-scheduling tier. |
地址 |
Mounain View CA US |