发明名称 |
Encoding of variable-length data with unary formats |
摘要 |
Embodiments provide methods and systems for encoding and decoding variable-length data, which may include methods for encoding and decoding search engine posting lists. Embodiments may include different encoding formats including group unary, packed unary, and/or packed binary formats. Some embodiments may utilize single instruction multiple data (SIMD) instructions that may perform a parallel shuffle operation on encoded data as part of the decoding processes. Some embodiments may utilize lookup tables to determine shuffle sequences and/or masks and/or shifts to be utilized in the decoding processes. Some embodiments may utilize hybrid formats. |
申请公布号 |
US9336225(B2) |
申请公布日期 |
2016.05.10 |
申请号 |
US201113077479 |
申请日期 |
2011.03.31 |
申请人 |
A9.com, Inc. |
发明人 |
Rose Daniel E.;Stepanov Alexander A.;Gangolli Anil Ramesh;Oberoi Paramjit S.;Ernst Ryan Jacob |
分类号 |
G06F17/30;H03M7/40;H03M7/46 |
主分类号 |
G06F17/30 |
代理机构 |
Hogan Lovells US LLP |
代理人 |
Hogan Lovells US LLP |
主权项 |
1. A computer-implemented method for encoding document identification numbers for a search engine posting list comprising:
determining a block size for data storage; receiving a plurality of document identification numbers for the search engine posting list; determining differences between adjacent document identification numbers of the plurality of document identification numbers; determining a plurality of encoded representations for the differences between adjacent document identification numbers, wherein each encoded representation of the plurality of encoded representations is a variable-length representation; identifying a sequential subset of the plurality of encoded representations, wherein a sum of a respective size of each encoded representation in the sequential subset is less than or equal to the block size for data storage; generating one or more descriptors for the sequential subset, the one or more descriptors providing information regarding the respective size of each encoded representation in the sequential subset in a packed unary representation, the packed unary representation packing a unary descriptor comprising a plurality of continuation bits into least-significant bits of an encoded representation; and storing the one or more descriptors and the sequential subset of the plurality of encoded representations. |
地址 |
Palo Alto CA US |