Approximate distinct counting in a bounded memory
Abstract:
A table is processed to determine an approximate NDV for a plurality of groups. For each row, a group based is identified based on one or more group-by columns. A hashed valued is generated by applying a uniform hash function to a value in an NDV column. The hashed value is assigned to a particular bucket based on the values at a first set of bit positions in a binary representation of the hashed value. A bit position value is determined based on for a remaining portion of the binary representation of the hashed value. The bit position value is based on a number of ordered bits in the hashed value that match a particular bit pattern. For each group identified, a maximum bit position (MBP) table is generated. The MBP table stores, for one or more buckets, the maximum bit position value determined for hashed values assigned to a particular bucket.
Public/Granted literature
Information query
Patent Agency Ranking
0/0