发明名称 Compression of small strings
摘要 A method for compressing a set of small strings may include calculating n-gram frequencies for a plurality of n-grams over the set of small strings, selecting a subset of n-grams from the plurality of n-grams based on the calculated n-gram frequencies, defining a mapping table that maps each n-gram of the subset of n-grams to a unique code, and compressing the set of small strings by replacing n-grams within each small string in the set of small strings with corresponding unique codes from the mapping table. The method may use linear optimization to select a subset of n-grams that achieves a maximum space saving amount over the set of small strings for inclusion in the mapping table. The unique codes may be variable-length one or two byte codes. The set of small strings may be domain names.
申请公布号 US8924446(B2) 申请公布日期 2014.12.30
申请号 US201113339562 申请日期 2011.12.29
申请人 Verisign, Inc. 发明人 Thomas Matthew;Perroud Benoit
分类号 H03M7/30 主分类号 H03M7/30
代理机构 MH2 Technology Law Group, LLP 代理人 MH2 Technology Law Group, LLP
主权项 1. A computer-implemented method for compressing a set of small strings, the method comprising: calculating, by a processor, n-gram frequencies for a plurality of n-grams over the set of small strings; selecting, by the processor, a subset of n-grams from the plurality of n-grams based on the calculated n-gram frequencies; defining, by the processor, a mapping table that maps each n-gram of the subset of n-grams to a unique code, wherein the unique codes are variable-length one or two byte codes; and compressing, by the processor, the set of small strings by replacing n-grams within each small string in the set of small strings with corresponding unique codes from the mapping table.
地址 Reston VA US