Abstract:
Methods, systems, and apparatus include computer programs encoded on a computer-readable storage medium for labeling user identifiers. A method includes: identifying a set of unlabeled identifiers, wherein an unlabeled identifier has an unknown classification as to a particular class in a multi-class demographic characteristic; determining for each unlabeled identifier a probability as to inclusion in a class of the multi-class demographic characteristic based on known user behavior producing a distribution of probabilities for the unlabeled identifier; for a given unlabeled identifier, adjusting the probability based on a known internet distribution of entities with respect to a given class in the multi-class demographic characteristic and distribution of the probabilities among the unlabeled identifiers; and assigning a label for a particular class in the multi-class demographic characteristic to the unlabeled identifier in accordance with the adjusting.
Abstract:
A computer-implemented method for determining an attribute for an online user of a candidate computing device is provided. The method implemented uses a host computing device. The method includes identifying a first set of model data including device data from a plurality of model computing devices including location data and access data, and a plurality of categories for an attribute of a population segment including an online user. Each category defines a segment of the attribute. The method further includes training a classification model by the host computing device with at least the first set of model data and the plurality of categories. The method also includes identifying device data associated with the candidate computing device. The method further includes applying the device data of the candidate computing device to the classification model to determine a category of the plurality of categories for the online user.
Abstract:
Systems and methods for content selection with precision controls include receiving device identifier data from multiple sources. A machine learning model may be applied to the device identifier data and content selection parameter values may be predicted. Percentiles for the predicted content selection parameter values may be analyzed to determine precision factors for the predicted content selection parameter values.