发明名称 Inverted index and inverted list process for storing and retrieving information
摘要 A process is disclosed for the computer management of inverted lists and inverted indices, in which the standard representation and processing of inverted lists is changed in order to achieve a simpler, more compact and more efficient architecture.
申请公布号 US9600578(B1) 申请公布日期 2017.03.21
申请号 US201414259701 申请日期 2014.04.23
申请人 Sacco Giovanni M. 发明人 Sacco Giovanni M.
分类号 G06F17/30 主分类号 G06F17/30
代理机构 Young & Thompson 代理人 Young & Thompson
主权项 1. A method of using a computer to efficiently produce an ordered list of pointers from N inverted lists, said list of pointers containing all and only the pointers that are stored in one of the inverted lists and also in each of the other lists, the method comprising: storing each inverted list as an ordered sequence of index records; storing each index record with an encoding of N pointers, N being larger than or equal to one, and N being no larger than a total number of pointers in the inverted list, and allowing access to the value of a minimum pointer and of a maximum pointer stored in the index record, each index record being contiguous or non-contiguous, an index record being contiguous when the index record stores all possible pointer values between the minimum and the maximum pointer values extremes included, the index record being non-contiguous when the index record does not store all the possible pointer values between the minimum and the maximum pointer values extremes included; sequentially reading each inverted list through a cursor C, C consisting of a component C.pos identifying the current index record in the ordered list of index records, and of a component C.cont representing a group of pointers in the form <ps, pe>, where C.cont.ps identifies the minimum pointer and C.cont.pe the maximum pointer in the group, C.cont being set during the operation of reading the current index record identified by C.pos and C.cont.ps being set to the minimum pointer in the current index record and C.cont.pe being set to the maximum pointer in the current index record during said operation; and advancing the cursor to the next pointer wherein when the current pointer C.cont.ps is not the last pointer in the current record, C.cont.ps is set to the next pointer in the current record, and C.cont.ps is returned, and otherwise when the current pointer C.cont.ps is the last pointer in the current record, the next record in the inverted list is read, when said record does not exist an end-of-scan condition being returned, otherwise C.cont.ps being returned, each cursor C also supporting an operation that advances said cursor C to a value x, where: a) while x>C.cont.pe, the cursor is advanced to the next record, b) when x≦C.cont.ps, the cursor is not advanced, c) when C.cont.ps<x≦C.cont.pe, the cursor is advanced to the next pointer until x≦C.cont.ps; each list J in said N inverted lists having a cursor C[J], said operation to produce an ordered list of pointers from N inverted lists, said list of pointers containing all and only the pointers that are stored in one of the inverted lists and also in each of the other lists, comprising the following steps: a. compute minp as the minimum value of C[J].cont.pe for all the cursors and maxp as the maximum value of C[J].cont.ps for all the cursors; b. when minp≧maxp then compute the intersection among the lists in the interval <maxp, minp> and advance each cursor to minp+1, otherwise advance each cursor to maxp, iterate in both cases steps a and b until any cursor returns an end-of-scan condition during advancing; wherein: when all cursors are defined on a contiguous index record, the output of the intersection in the interval <maxp, minp> is given by all the possible pointer values in said interval; otherwise, the output of the intersection in the interval <maxp, minp> is computed by ignoring all the cursors defined on a contiguous index record, and outputting all the values in the interval <maxp, minp> extremes included, that appear in all the remaining cursors.
地址 Turin IT