发明名称 |
Inverted indexes for accelerating analytics queries |
摘要 |
The disclosed embodiments provide a system that processes data. During operation, the system obtains a set of records, wherein each of the records comprises one or more metrics and at least one dimension associated with the one or more metrics. Next, the system creates, in a data segment comprising the records, an inverted index for a column in the records based on a cardinality of the column. Finally, the system compresses the inverted index based on a jump value associated with record identifiers in the column. |
申请公布号 |
US8762387(B1) |
申请公布日期 |
2014.06.24 |
申请号 |
US201313956223 |
申请日期 |
2013.07.31 |
申请人 |
LinkedIn Corporation |
发明人 |
Patel Dhaval;Dubey Sanjay;Naga Praveen N.;Zhabiuk Volodymyr;Jung Jintae |
分类号 |
G06F17/30 |
主分类号 |
G06F17/30 |
代理机构 |
Park, Vaughan, Fleming & Dowler LLP |
代理人 |
Park, Vaughan, Fleming & Dowler LLP ;Suen Chia-Hsin |
主权项 |
1. A computer-implemented method for processing data, comprising:
obtaining a set of records, wherein each of the records comprises one or more metrics and at least one dimension associated with the one or more metrics; creating, in a data segment comprising the records, an inverted index for a column in the records based on a cardinality of the column; and compressing the inverted index based on a jump value associated with record identifiers in the column, which comprises: a computer determining the jump value based on a threshold associated with compressing the inverted index; and the computer including a record identifier in the compressed inverted index if a difference between the record identifier and a consecutive record identifier in the inverted index is greater than the jump value; wherein the threshold is associated with a proportion of the record identifiers to be included in the compressed inverted index.
|
地址 |
Mountain View CA US |