Abstract:
A method for providing efficient and accurate estimates of TV viewership ratings through a distributed computer system that includes multiple computers is disclosed. The method includes: receiving a query from a client at the distributed computer system; dynamically selecting one or more computers according to a predefined sharding function; at each of the selected computers, determining a count of qualified event records that satisfy the query; aggregating the respective counts of qualified event records determined by the selected computers; statistically projecting the aggregated count of qualified event records into an estimated total count of qualified event records on the distributed computer system; and returning the estimated total count of qualified event records to the requesting client.