发明名称 |
Compression of small strings |
摘要 |
A method for compressing a set of small strings may include calculating n-gram frequencies for a plurality of n-grams over the set of small strings, selecting a subset of n-grams from the plurality of n-grams based on the calculated n-gram frequencies, defining a mapping table that maps each n-gram of the subset of n-grams to a unique code, and compressing the set of small strings by replacing n-grams within each small string in the set of small strings with corresponding unique codes from the mapping table. The method may use linear optimization to select a subset of n-grams that achieves a maximum space saving amount over the set of small strings for inclusion in the mapping table. The unique codes may be variable-length one or two byte codes. The set of small strings may be domain names. |
申请公布号 |
US8924446(B2) |
申请公布日期 |
2014.12.30 |
申请号 |
US201113339562 |
申请日期 |
2011.12.29 |
申请人 |
Verisign, Inc. |
发明人 |
Thomas Matthew;Perroud Benoit |
分类号 |
H03M7/30 |
主分类号 |
H03M7/30 |
代理机构 |
MH2 Technology Law Group, LLP |
代理人 |
MH2 Technology Law Group, LLP |
主权项 |
1. A computer-implemented method for compressing a set of small strings, the method comprising:
calculating, by a processor, n-gram frequencies for a plurality of n-grams over the set of small strings; selecting, by the processor, a subset of n-grams from the plurality of n-grams based on the calculated n-gram frequencies; defining, by the processor, a mapping table that maps each n-gram of the subset of n-grams to a unique code, wherein the unique codes are variable-length one or two byte codes; and compressing, by the processor, the set of small strings by replacing n-grams within each small string in the set of small strings with corresponding unique codes from the mapping table. |
地址 |
Reston VA US |