Abstract:
A method for providing efficient and accurate estimates of TV viewership ratings through a distributed computer system that includes multiple computers is disclosed. The method includes: receiving a query from a client at the distributed computer system; dynamically selecting one or more computers according to a predefined sharding function; at each of the selected computers, determining a count of qualified event records that satisfy the query; aggregating the respective counts of qualified event records determined by the selected computers; statistically projecting the aggregated count of qualified event records into an estimated total count of qualified event records on the distributed computer system; and returning the estimated total count of qualified event records to the requesting client.
Abstract:
A method, implemented by a processor, for combining multiple data sources in a product purchase study includes acquiring, by a processor, first product purchase data for a product from a first data source, the first product purchase data uniquely identifying the product; sending, by the processor, the first product purchase data to a remote server; receiving, by the processor, a signal from the remote server based on the first product purchase data, the signal comprising a request for additional product purchase data; acquiring by the processor in response to the request, second product purchase data from a second source independent of the first source to the remote server; and sending the second product purchase data to the remote server.