发明名称 APPROXIMATE DISTINCT COUNTING IN A BOUNDED MEMORY
摘要 A table is processed to determine an approximate NDV for a plurality of groups. For each row, a group based is identified based on one or more group-by columns. A hashed valued is generated by applying a uniform hash function to a value in an NDV column. The hashed value is assigned to a particular bucket based on the values at a first set of bit positions in a binary representation of the hashed value. A bit position value is determined based on for a remaining portion of the binary representation of the hashed value. The bit position value is based on a number of ordered bits in the hashed value that match a particular bit pattern. For each group identified, a maximum bit position (MBP) table is generated. The MBP table stores, for one or more buckets, the maximum bit position value determined for hashed values assigned to a particular bucket.
申请公布号 US2017024387(A1) 申请公布日期 2017.01.26
申请号 US201514818663 申请日期 2015.08.05
申请人 Oracle International Corporation 发明人 Su Hong;Zait Mohamed;Chakkappen Sunil
分类号 G06F17/30 主分类号 G06F17/30
代理机构 代理人
主权项 1. A method comprising: processing rows in a portion of a table to determine an approximate number of distinct values (NDV) in an NDV column for a plurality of groups specified in a set of one or more group-by columns, wherein each row is processed by: identifying a group based on one or more row values in the set of one or more group-by columns;generating a hashed value by applying a uniform hash function to a row value in the NDV column;assigning the hashed value to a particular bucket based on values at a first set of bit positions in a binary representation of the hashed value; anddetermining a bit position value based on for a remaining portion of the binary representation of the hashed value, wherein the first set of bit positions are excluded from the remaining portion,wherein the bit position value is based on a number of ordered bits in the hashed value that match a particular bit pattern; for each group identified while processing the rows, generating a maximum bit position (MBP) table comprising, for one or more buckets, a maximum bit position value determined for hashed values assigned to a particular bucket of the one or more buckets; determining one or more approximate NDVs for one or more particular groups of the plurality of groups based on one or more MBP tables for the one or more particular groups; wherein the method is performed by one or more computing devices.
地址 Redwood Shores CA US