发明名称 Assigning document identification tags
摘要 Document identification tags are assigned to documents to be added to a collection of documents. Based on query-independent information about a new document, a document identification tag is assigned to the new document. The document identification tag so assigned is used in the indexing of the new document. When a list of document identification tags are produced by an index in response to a query, the list is approximately ordered with respect to a measure of query-independent relevance. In some embodiments, the measure of query-independent relevance is related to the connectivity matrix of the World Wide Web. In other embodiments, the measure is related to the recency of crawling. In still other embodiments, the measure is a mixture of these two. The provided systems and methods allow for real-time indexing of documents as they are crawled from a collection of documents.
申请公布号 US9411889(B2) 申请公布日期 2016.08.09
申请号 US201213419349 申请日期 2012.03.13
申请人 Google Inc. 发明人 Zhu Huican;Acharya Anurag
分类号 G06F17/00;G06F17/30;H04L29/08;G06Q30/02;H04L12/24;H04L29/06 主分类号 G06F17/00
代理机构 代理人
主权项 1. A computer-implemented method of assigning a document identifier to a new document, the new document to be added to a collection of documents, the method being performed on a computer system having one or more processors and memory storing one or more programs for execution by the one or more processors, the method comprising: partitioning a set of document identifiers into a plurality of segments, each segment associated with a respective subset of the set of document identifiers, wherein the document identifiers comprise a predetermined set of monotonically ordered document identification tags; subdividing each of the segments into a plurality of tiers, wherein each tier is associated with a respective subset of the set of document identifiers, and wherein the plurality of tiers are monotonically ordered with respect to a query-independent document importance metric; receiving query-independent information about the new document, the information including a value of the query-independent document importance metric and a unique document identifier for the new document; selecting, based at least in part on the unique document identifier, one of the segments; selecting, based at least on the query-independent information, one of the tiers associated with the selected segment; assigning to the new document a document identifier from the respective subset of document identifiers associated with the selected tier, the assigned document identifier not previously assigned to any of the documents in the collection of documents, and repeating the receiving, selecting a segment, selecting a tier, and assigning, with respect to one or more additional new documents.
地址 Mountain View CA US