摘要:
A method and a system for abstracting cooked variables from raw variables. In one embodiment, a data set that has a plurality of records is input into a system, where each record has a value for each of a plurality of raw transactional variables. These variables are organized into a hierarchy of nodes. The raw transactional variables are abstracted into a lesser number of cooked transactional variables, and the cooked transactional variables are output.
摘要:
Targeted delivery of items with inventory management using a cluster-based approach or a rule-based approach is disclosed. An example of items is advertisements. Each item is allocated to one or more clusters. The allocation is made based on a predetermined criterion accounting for at least a quota for each item and possibly a constraint for each cluster. The former can refer to the number of times an item must be shown. The latter can refer to the number of times a given group of web pages is likely to be visited by users, and hence is the number of times items can be shown in a given cluster. The invention is not limited to any particular definition of what constitutes a cluster or item.
摘要:
The transmission of information during ad click-through is disclosed. In one embodiment, a computer-implemented method selects an ad to be displayed on a web page, as one of a plurality of ads within a current cluster in which each of the ad has a probability to be selected. The method displays the ad on the web page, and then detects activation—for example, click-through—of the displayed ad. The method transmits information to an entity associated with the ad, such as an advertiser, upon detecting click-through or other activation of the ad. In one embodiment, the information transmitted includes information regarding the current cluster.
摘要:
Clustering for purposes of data visualization and making predictions is disclosed. Embodiments of the invention are operable on a number of variables that have a predetermined representation. The variables include input-only variables, output-only variables, and both input-and-output variables. Embodiments of the invention generate a model that has a bottleneck architecture. The model includes a top layer of nodes of at least the input-only variables, one or more middle layer of hidden nodes, and a bottom layer of nodes of the output-only and the input-and-output variables. At least one cluster is determined from this model. The model can be a probabilistic neural network and/or a Bayesian network.
摘要:
An architecture for automated data analysis. In one embodiment, a computerized system comprising an automated problem formulation layer, a first learning engine, and a second learning engine. The automated problem formulation layer receives a data set. The data set has a plurality of records, where each record has a value for each of a plurality of raw transactional variables. The layer abstracts the raw transactional variables into cooked transactional variables. The first learning engine generates a model for the cooked transactional variables, while the second learning engine generates a model for the raw transactional variables.
摘要:
A decision theoretic approach to targeted solicitation, by maximizing expected profit increases, is disclosed. A decision theoretic model is used to identify a sub-population of a population to solicit, where the model is constructed to maximize an expected increase in profits. A decision tree in particular can be used as the model. The decision tree has paths from a root node to a number of leaf nodes. The decision tree has a split on a solicitation variable in every path from the root node to each leaf node. The solicitation variable has two values, a first value corresponding to a solicitation having been made, and a second value corresponding to a solicitation not having been made.
摘要:
Reduction of noise within a cluster-based approach for item (such as ad) allocation, such as by using a linear program, is described. In one embodiment, probabilities are discretized into a predetermined number of groups, where the mean for the group that a particular probability has been discretized into is substituted for the particular probability when the items are being allocated. In another embodiment, the probabilities are decreased by a power function of the variances for them. In a third embodiment, allocation of items to clusters is not changed unless the sample sizes used to determine the corresponding probabilities for those ads is greater than a threshold. In a fourth embodiment, after allocation is performed a first time, a predetermined number of item are removed, and reallocation is performed.
摘要:
Visualization of high-dimensional data sets is disclosed, particularly the display of a network model for a data set. The network, such as a dependency or a Bayesian network, has a number of nodes having dependencies thereamong. The network can be displayed items and connections, corresponding to nodes and dependencies, respectively. Selection of a particular item in one embodiment results in the display of the local distribution associated with the node for the item. In one embodiment, only a predetermined number of the items are shown, such as only the items representing the most popular nodes. Furthermore, in one embodiment, in response to receiving a user input, a sub-set of the connections is displayed, proportional to the user input. In another embodiment, a particular item is displayed in an emphasized manner, and the particular connections representing dependencies including the node represented by the particular item, as well as the items representing nodes also in these dependencies, are also displayed in the emphasized manner. Furthermore, in one embodiment, only an indicated sub-set of the items is displayed.
摘要:
A system and method are employed to construct an association network to visualize relationships between variables of a data set. The relationships characterized by the association network may include symmetric or asymmetric measures of association between variables learned from the data. The association network includes nodes, which represent variables, and edges, which represent associations between variables. As a result, the association network helps a user to visualize useful information from data according to the determined measure of association.
摘要:
An indirect calorimeter estimates nutritional caloric intake by periodically monitoring weight and sensing physical exercise (i.e., physiological data and/or motion data related to physical exertion), which can then be used in a calorimetry model derived from regression analysis of a population (e.g., linear regression, feed-forward neural network, Gaussian process, boosted regression tree, etc.). A strap-on user device for tracking exercise can detect one or more of heart rate, body temperature, skin resistance, motion/acceleration sensing (e.g., pedometer, accelerometer), velocity sensing (e.g., global positioning system (GPS)), and an intelligent, integrated exercise machine (e.g., treadmill, exercise bike, etc.). To gain further fidelity, the user can fine-tune the estimate by undergoing a journal-based routine for a relatively short period of time or clinical calorimetry measurement (e.g., respiratory calorimeter), thereby providing a baseline for resting or exercising metabolic rate.