摘要 |
A method, system, and processor-readable storage medium are directed towards calculating approximate order statistics on a collection of real numbers. In one embodiment, the collection of real numbers is processed to create a digest comprising hierarchy of buckets. Each bucket is assigned a real number N having P digits of precision and ordinality O. The hierarchy is defined by grouping buckets into levels, where each level contains all buckets of a given ordinality. Each individual bucket in the hierarchy defines a range of numbers—all numbers that, after being truncated to that bucket's P digits of precision, are equal to that bucket's N. Each bucket additionally maintains a count of how many numbers have fallen within that bucket's range. Approximate order statistics may then be calculated by traversing the hierarchy and performing an operation on some or all of the ranges and counts associated with each bucket. |
主权项 |
1. A computer-implemented method for calculating approximate order statistics from a collection of floating point numbers from a digest in a network comprising:
receiving machine data, wherein the machine data includes a floating point number; extracting, using one or more processors, the floating point number from the machine data; determining, using the one or more processors, an ordinality of the floating point number, wherein the ordinality of each floating point number is determined by converting the floating point number to a mantissa and an exponent and subtracting a number of significant digits in the mantissa to the right of the decimal point including significant zeros from the exponent; identifying, using the one or more processors and based on the determined ordinality, a level from amongst a plurality of levels in the digest, the digest being stored in a non-transitory memory and including a plurality of buckets positioned along the plurality of levels, wherein each bucket of the plurality of buckets is:
defined by the ordinality of the level along which it is positioned,further defined by a range limited by one or more extrema, andassociated with a count that reflects a quantity of floating point numbers; identifying, using the one or more processors, a bucket positioned at the identified level and being defined by a range that is inclusive of the floating point number; incrementing, using the one or more processors, the count of the identified bucket, wherein the identified bucket, for which the count was incremented, has a plurality of child buckets in the digest, wherein the digest is configured to be used to generate a response to a query based on the incremented count of the bucket; identifying, using the one or more processors, a set of buckets based on a query value in the query, wherein the set of buckets includes the identified bucket; and estimating, using the one or more processors, an order statistic for the query value based on a summation of counts associated with the identified set of buckets.
|