发明名称 Tracking set-expression cardinalities over continuous update streams
摘要 A method of estimating set-expression cardinalities over data streams with guaranteed small maintenance time per data-element update. The method only examines each data element once and uses a limited amount of memory. The time-efficient stream synopsis extends 2-level hash-sketches by randomly, but uniformly, pre-hashing data-elements prior to logarithmically hashing them to a first-level hash-table. This generates a set of independent 2-level hash-sketches. The set-union cardinality can be estimated by determining the smallest hash-bucket index j at which only a predetermined fraction of the b hash-buckets has a non-empty union |A∪B|. Once a set-union cardinality is estimated, general set-expression cardinalities may be estimated by counting witness elements for the set-expression, i.e., those first-level hash-buckets that are both a singleton for the set-expression and a set-union singleton. The set-expression cardinality is the set-union cardinality times the number of witness elements divided by the number of hash-buckets.
申请公布号 US2006143218(A1) 申请公布日期 2006.06.29
申请号 US20040025355 申请日期 2004.12.29
申请人 LUCENT TECHNOLOGIES, INC. 发明人 GANGULY SUMIT;GAROFALAKIS MINOS;RASTOGI RAJEEV
分类号 G06F7/00;G06F17/00 主分类号 G06F7/00
代理机构 代理人
主权项
地址