摘要:
Disclosed is a system for approximating conditional probabilities using an annotated decision tree where predictor values that did not exist in training data for the system are tracked, stored, and referenced to determine if statistical aggregation should be invoked. Further disclosed is a system for storing statistics for deriving a non-leaf probability corresponding to predictor values, and a system for aggregating such statistics to approximate conditional probabilities.
摘要:
Decision trees populated with classifier models are leveraged to provide enhanced spam detection utilizing separate email classifiers for each feature of an email. This provides a higher probability of spam detection through tailoring of each classifier model to facilitate in more accurately determining spam on a feature-by-feature basis. Classifiers can be constructed based on linear models such as, for example, logistic-regression models and/or support vector machines (SVM) and the like. The classifiers can also be constructed based on decision trees. “Compound features” based on internal and/or external nodes of a decision tree can be utilized to provide linear classifier models as well. Smoothing of the spam detection results can be achieved by utilizing classifier models from other nodes within the decision tree if training data is sparse. This forms a base model for branches of a decision tree that may not have received substantial training data.
摘要:
The present invention utilizes a cross-prediction scheme to predict values of discrete and continuous time observation data, wherein conditional variance of each continuous time tube variable is fixed to a small positive value. By allowing cross-predictions in an ARMA based model, values of continuous and discrete observations in a time series are accurately predicted. The present invention accomplishes this by extending an ARMA model such that a first time series “tube” is utilized to facilitate or “cross-predict” values in a second time series tube to form an “ARMAxp” model. In general, in the ARMAxp model, the distribution of each continuous variable is a decision graph having splits only on discrete variables and having linear regressions with continuous regressors at all leaves, and the distribution of each discrete variable is a decision graph having splits only on discrete variables and having additional distributions at all leaves.
摘要:
Systems and methods allow an on-line game to extract information relevant to a specific need of a game platform or service platform. The specific need relates to management and use of digital content, and is addressed by designing and playing an on-line collaborative game. The rules of the game intend to solve a specific task dictated by the specific need. Players' responses to the game generate a wealth of information related to a specific task objective, such as ranking, sorting, and evaluating a set of digital content items. To compel participation in a game, players can be rewarded with monetary value rewards. As a game illustration, an image selection game (ISG) that exploits human contextual inference is described in detail. The information extracted from ISG is a list of key-image associations, relevant for the task of image sorting and ranking.
摘要:
Useful information is acquired from a community of individuals by way of a game that rewards participants with social information about other participants. Points can be awarded to participants simply for participation and/or as a function of game performance. Such points can subsequently be exchanged to reveal information about game partners or other community members. Among other things, such a reward system can motivate individuals to perform tasks that might not otherwise be compelling and/or enjoyable.
摘要:
On-line and/or off-line advertisement interactions are tracked for individual users. This information can then be utilized to adjust display parameters for an advertisement. Tracking can be accomplished via a client-side tracking mechanism and/or a server side tracking mechanism. The advertisement interactions allow advertisers to adjust their advertising campaigns to better target their advertisements. The tracked interactions can include, but are not limited to selections (clicking, etc.) and/or conversions (purchases) and the like. Some instances include a display component that can employ the user-specific interaction information to automatically adjust, for example, location, frequency, and/or to whom an advertisement is displayed. The interaction information can also be utilized for revenue generation by charging advertisers for the information and/or for adjusting their advertising campaigns and the like. Instances can be utilized with on-line and/or off-line advertising media.
摘要:
Advertiser monetization information is utilized to determine a search query monetization value that can be employed in web-search ranking to facilitate in ranking search results and/or in email spam filtering to reduce unsolicited emails and the like. Various methods can be employed to filter and/or rank and the like based on the search query monetization value. This can include biasing based on high values and/or low values. The search query monetization value can be determined based on, for example, independent phrases and/or bids. In other instances, personal user advertising interactions can be employed as well to facilitate search result ranking and/or email spam filtering. Employment of search query monetization value techniques can substantially reduce various types of subversive/undesired information.
摘要:
A service manager manages connection tokens in a network of users. The connection token has a plurality of defined terms and can be representative of a commitment of time for a user in the network. Connection tokens can be used to engage in a real-time communication with another user in exchange for a fee. The service manager manages possession of the connection tokens amongst the users of the network and executes the connection token in accordance with the defined terms. Additionally, the service manager can facilitate real-time communication among users based on the connection tokens.
摘要:
A service manager manages information solicitations in a network of users. An information solicitation is posted that is received from an information consumer. The posted information solicitation is provided to at least a portion of the users of the network for auction. The information solicitation includes a request to engage in a real-time communication with an information provider about a particular subject. Bids are received from a plurality of information providers. The bids are provided to the information consumer for selection. The information consumer is connected with a selected one of the plurality of information providers.
摘要:
A system that incorporates an interactive graphical user interface for visualizing clusters (categories) and segments (summarized clusters) of data. Specifically, the system automatically categorizes incoming case data into clusters, summarizes those clusters into segments, determines similarity measures for the segments, scores the selected segments through the similarity measures, and then forms and visually depicts hierarchical organizations of those selected clusters. The system also automatically and dynamically reduces, as necessary, a depth of the hierarchical organization, through elimination of unnecessary hierarchical levels and inter-nodal links, based on similarity measures of segments or segment groups. Attribute/value data that tends to meaningfully characterize each segment is also scored, rank ordered based on normalized scores, and then graphically displayed. The system permits a user to browse through the hierarchy, and, to readily comprehend segment inter-relationships, selectively expand and contract the displayed hierarchy, as desired, as well as to compare two selected segments or segment groups together and graphically display the results of that comparison. An alternative discriminant-based cluster scoring technique is also presented.