摘要 |
<p>A method for compressing a set of small strings may include calculating n-gram frequencies for a plurality of n-grams over the set of small strings, selecting a subset of n-grams from the plurality of n-grams based on the calculated n-gram frequencies, defining a mapping table that maps each n-gram of the subset of n-grams to a unique code, and compressing the set of small strings by replacing n-grams within each small string in the set of small strings with corresponding unique codes from the mapping table. The method may use linear optimization to select a subset of n-grams that achieves a maximum space saving amount over the set of small strings for inclusion in the mapping table. The unique codes may be variable-length one or two byte codes. The set of small strings may be domain names.
</p> |