发明名称 METHODS AND SYSTEMS FOR INDEXING REFERENCES TO DOCUMENTS OF A DATABASE AND FOR LOCATING DOCUMENTS IN THE DATABASE
摘要 Methods and systems allow indexing references to documents of a database according to database reference profiles. Documents may then be located in the database using decoding protocols based on the database reference profiles. To this end, the documents are stored in the database and searchable terms extracted therefrom are associated with posting lists. Each posting list is divided into blocks of M database references. The blocks are encoded according to a pattern that depends on the M database references. A corresponding pointer to a table of encoding patterns is appended to each block. When a query is received for a searchable term, blocks are extracted from a posting list corresponding to the searchable term and a pointer for each block is used to extract a decoding protocol related to an encoding pattern for the block.
申请公布号 US2015356169(A1) 申请公布日期 2015.12.10
申请号 US201514829665 申请日期 2015.08.19
申请人 YANDEX EUROPE AG 发明人 POPOV Petr Sergeevich
分类号 G06F17/30 主分类号 G06F17/30
代理机构 代理人
主权项 1. A method for indexing references to documents of a database, the method comprising: receiving a document, at the database, from a server; storing the document in the database; extracting a searchable term from the document, the searchable term being associated with a posting list; dividing the posting list into blocks, each block comprising M database references; for each block: determining an encoding pattern based on values of the M database references, the determining the encoding pattern comprises: determining a number n of patches according to a number of references, among the M database references, that are greater than or equal to 2b; andif n>0: calculating, for each of n patches, a patch value vk by deleting b least significant bits from a corresponding one of the M database references that are greater than or equal to 2b, wherein k is in a range from 1 to n, anddetermining, for each of the n patches, a patch position pk corresponding to a position, in a range of 0 to M−1, of the corresponding one of the M database references that are greater than or equal to 2b;wherein the encoding pattern comprises b, n, p1 . . . pn, v1 . . . vn;locating an encoding pattern table entry corresponding to the encoding pattern;inserting a pointer corresponding to the located encoding pattern table entry in a header for the block; andinserting in the block a sequence of M truncated references, each truncated reference comprising b least significant bits of a corresponding one of the M database references.
地址 Luzern CH