Methods and systems of generation of histograms for strings are described. In one implementation, a prefix tree having nodes representing prefixes of the strings is generated. For the prefix tree, deploy weights are assigned to the nodes based on lengths of the prefixes represented by sub-tree nodes rooted at the nodes and frequencies of the strings whose prefixes are represented by the sub-tree nodes. Each of the deploy weights of one node is indicative of a maximum weight preserved upon filling the buckets with at least one prefix represented by the sub-tree nodes rooted at that one node. A predefined number of Top-prefixes are determined for filling up the predefined number of buckets. The Top-prefixes are determined based on maximizing a total weight preserved by the prefixes in the buckets and over a maximum number of strings. A histogram is generated based on the deploy weights associated with the Top-prefixes.
申请公布号
WO2014176754(A1)
申请公布日期
2014.11.06
申请号
WO2013CN75033
申请日期
2013.04.30
申请人
HEWLETT-PACKARD DEVELOPMENT COMPANY, L.P.;LUO, GE;JIAO, LI-MEI;CAO, ZHAO;CHEN, SHIMIN;GUO, MENG