Abstract:
A computer-implemented method for determining an attribute for an online user of a candidate computing device is provided. The method implemented uses a host computing device. The method includes identifying a first set of model data including device data from a plurality of model computing devices including location data and access data, and a plurality of categories for an attribute of a population segment including an online user. Each category defines a segment of the attribute. The method further includes training a classification model by the host computing device with at least the first set of model data and the plurality of categories. The method also includes identifying device data associated with the candidate computing device. The method further includes applying the device data of the candidate computing device to the classification model to determine a category of the plurality of categories for the online user.
Abstract:
Methods, systems and apparatus, including computer programs encoded on computer storage media for approximating item counts. One of the methods includes maintaining a collection of counters for a class of items, processing each item in an item stream as a current item, including determining whether or not the collection includes an item counter for the current item, and if the collection includes an item counter for the current item, updating each count level in the item counter for the current item.
Abstract:
Methods, systems and apparatus are described herein that include processing a data stream as a sequence of batch jobs during collection of data in the data stream. Processing of successive batch jobs in the sequence includes creating a particular batch job upon completion of processing of a preceding batch job in the sequence. The particular batch job has a batch size that depends upon an amount of data in the data stream that has been collected since creation of the preceding batch job in the sequence, such that the batch size of the particular batch job self-adjusts to data rate changes in the data stream. The particular batch job is then processed to produce resulting data, where processing efficiency and processing time for the particular batch increase with the batch size.