发明名称 Inverted indexes for accelerating analytics queries
摘要 The disclosed embodiments provide a system that processes data. During operation, the system obtains a set of records, wherein each of the records comprises one or more metrics and at least one dimension associated with the one or more metrics. Next, the system creates, in a data segment comprising the records, an inverted index for a column in the records based on a cardinality of the column. Finally, the system compresses the inverted index based on a jump value associated with record identifiers in the column.
申请公布号 US8762387(B1) 申请公布日期 2014.06.24
申请号 US201313956223 申请日期 2013.07.31
申请人 LinkedIn Corporation 发明人 Patel Dhaval;Dubey Sanjay;Naga Praveen N.;Zhabiuk Volodymyr;Jung Jintae
分类号 G06F17/30 主分类号 G06F17/30
代理机构 Park, Vaughan, Fleming & Dowler LLP 代理人 Park, Vaughan, Fleming & Dowler LLP ;Suen Chia-Hsin
主权项 1. A computer-implemented method for processing data, comprising: obtaining a set of records, wherein each of the records comprises one or more metrics and at least one dimension associated with the one or more metrics; creating, in a data segment comprising the records, an inverted index for a column in the records based on a cardinality of the column; and compressing the inverted index based on a jump value associated with record identifiers in the column, which comprises: a computer determining the jump value based on a threshold associated with compressing the inverted index; and the computer including a record identifier in the compressed inverted index if a difference between the record identifier and a consecutive record identifier in the inverted index is greater than the jump value; wherein the threshold is associated with a proportion of the record identifiers to be included in the compressed inverted index.
地址 Mountain View CA US