Method and system to estimate the cardinality of sets and set operation results from single and multiple HyperLogLog sketches
Abstract:
A system and method for the estimation of the cardinality of large sets of transaction trace data is disclosed. The estimation is based on HyperLogLog data sketches that are capable to store cardinality relevant data of large sets with low and fixed memory requirements. The disclosure contains improvements to the known analysis methods for HyperLogLog data sketches that provide improved relative error behavior by eliminating a cardinality range dependent bias of the relative error. A new analysis method for HyperLogLog data structures is shown that uses maximum likelihood analysis methods on a Poisson based approximated probability model. In addition, a variant of the new analysis model is disclosed that uses multiple HyperLogLog data structured to directly provide estimation results for set operations like intersections or relative complement directly from the HyperLogLog input data.
Information query
Patent Agency Ranking
0/0