发明名称 Minimizing visibility of stale content in web searching including revising web crawl intervals of documents
摘要 A method includes comparing a first instance with a second instance of a document in a plurality of documents. The first instance is obtained from a remote location at a specified time before the second instance is obtained from the remote location, and (i) the specified time is determined in accordance with a first crawl interval associated with the document, (ii) each document in the plurality of documents is assigned to a tier in a plurality of tiers, each tier having a distinct associated range of web crawl intervals, and (iii) the first crawl interval is assigned a first tier. The method also includes computing a second crawl interval for the document, which is a function of the document comparison; and determining whether the second crawl interval is in the first tier. When the second crawl interval is not, the first document is reassigned to another tier.
申请公布号 US8782032(B2) 申请公布日期 2014.07.15
申请号 US201313849355 申请日期 2013.03.22
申请人 Google Inc. 发明人 Carver Anton P. T.
分类号 G06F17/30 主分类号 G06F17/30
代理机构 Morgan, Lewis & Bockius LLP 代理人 Morgan, Lewis & Bockius LLP
主权项 1. A method for scheduling a document crawl interval, comprising: at a computer system having one or more processors and a memory storing one or more programs for execution by the one or more processors: comparing a first instance of a document in a plurality of documents with a second instance of the document, thereby obtaining a document comparison, wherein the first instance of the document is obtained from a remote location at a specified time before the second instance of the document is obtained from the remote location and wherein (i) the specified time is determined in accordance with a first crawl interval associated with the document,(ii) each document in the plurality of documents is assigned to a crawl-scheduling tier in a plurality of crawl-scheduling tiers, each crawl-scheduling tier in the plurality of crawl-scheduling tiers having a distinct associated range of web crawl intervals, and(iii) the first crawl interval is assigned a first crawl-scheduling tier in the plurality of crawl-scheduling tiers; and computing a second crawl interval for the document, wherein the second crawl interval is a function of the document comparison; and determining whether the second crawl interval is in the crawl-scheduling first tier, wherein, when the second crawl interval is not in the crawl-scheduling first tier, the first document is reassigned to a crawl-scheduling tier in the plurality of crawl-scheduling tiers other than the first crawl-scheduling tier.
地址 Mounain View CA US
您可能感兴趣的专利