发明名称 Encoding of variable-length data with unary formats
摘要 Embodiments provide methods and systems for encoding and decoding variable-length data, which may include methods for encoding and decoding search engine posting lists. Embodiments may include different encoding formats including group unary, packed unary, and/or packed binary formats. Some embodiments may utilize single instruction multiple data (SIMD) instructions that may perform a parallel shuffle operation on encoded data as part of the decoding processes. Some embodiments may utilize lookup tables to determine shuffle sequences and/or masks and/or shifts to be utilized in the decoding processes. Some embodiments may utilize hybrid formats.
申请公布号 US9336225(B2) 申请公布日期 2016.05.10
申请号 US201113077479 申请日期 2011.03.31
申请人 A9.com, Inc. 发明人 Rose Daniel E.;Stepanov Alexander A.;Gangolli Anil Ramesh;Oberoi Paramjit S.;Ernst Ryan Jacob
分类号 G06F17/30;H03M7/40;H03M7/46 主分类号 G06F17/30
代理机构 Hogan Lovells US LLP 代理人 Hogan Lovells US LLP
主权项 1. A computer-implemented method for encoding document identification numbers for a search engine posting list comprising: determining a block size for data storage; receiving a plurality of document identification numbers for the search engine posting list; determining differences between adjacent document identification numbers of the plurality of document identification numbers; determining a plurality of encoded representations for the differences between adjacent document identification numbers, wherein each encoded representation of the plurality of encoded representations is a variable-length representation; identifying a sequential subset of the plurality of encoded representations, wherein a sum of a respective size of each encoded representation in the sequential subset is less than or equal to the block size for data storage; generating one or more descriptors for the sequential subset, the one or more descriptors providing information regarding the respective size of each encoded representation in the sequential subset in a packed unary representation, the packed unary representation packing a unary descriptor comprising a plurality of continuation bits into least-significant bits of an encoded representation; and storing the one or more descriptors and the sequential subset of the plurality of encoded representations.
地址 Palo Alto CA US