发明名称 Approximate order statistics of real numbers in generic data
摘要 A method, system, and processor-readable storage medium are directed towards calculating approximate order statistics on a collection of real numbers. In one embodiment, the collection of real numbers is processed to create a digest comprising hierarchy of buckets. Each bucket is assigned a real number N having P digits of precision and ordinality O. The hierarchy is defined by grouping buckets into levels, where each level contains all buckets of a given ordinality. Each individual bucket in the hierarchy defines a range of numbers—all numbers that, after being truncated to that bucket's P digits of precision, are equal to that bucket's N. Each bucket additionally maintains a count of how many numbers have fallen within that bucket's range. Approximate order statistics may then be calculated by traversing the hierarchy and performing an operation on some or all of the ranges and counts associated with each bucket.
申请公布号 US8756262(B2) 申请公布日期 2014.06.17
申请号 US201113038085 申请日期 2011.03.01
申请人 Splunk Inc. 发明人 Zhang Steve Yu
分类号 G06F17/18 主分类号 G06F17/18
代理机构 Hickman Palermo Truong Becker Bingham Wong LLP 代理人 Hickman Palermo Truong Becker Bingham Wong LLP ;Wong Kirk D.
主权项 1. A computer-implemented method for calculating approximate order statistics from a collection of floating point numbers from a digest in a network comprising: receiving machine data, wherein the machine data includes a floating point number; extracting, using one or more processors, the floating point number from the machine data; determining, using the one or more processors, an ordinality of the floating point number, wherein the ordinality of each floating point number is determined by converting the floating point number to a mantissa and an exponent and subtracting a number of significant digits in the mantissa to the right of the decimal point including significant zeros from the exponent; identifying, using the one or more processors and based on the determined ordinality, a level from amongst a plurality of levels in the digest, the digest being stored in a non-transitory memory and including a plurality of buckets positioned along the plurality of levels, wherein each bucket of the plurality of buckets is: defined by the ordinality of the level along which it is positioned,further defined by a range limited by one or more extrema, andassociated with a count that reflects a quantity of floating point numbers; identifying, using the one or more processors, a bucket positioned at the identified level and being defined by a range that is inclusive of the floating point number; incrementing, using the one or more processors, the count of the identified bucket, wherein the identified bucket, for which the count was incremented, has a plurality of child buckets in the digest, wherein the digest is configured to be used to generate a response to a query based on the incremented count of the bucket; identifying, using the one or more processors, a set of buckets based on a query value in the query, wherein the set of buckets includes the identified bucket; and estimating, using the one or more processors, an order statistic for the query value based on a summation of counts associated with the identified set of buckets.
地址 San Francisco CA US