摘要:
A system is disclosed for obtaining and aggregating opinions generated by multiple sources with respect to one or more objects. The disclosed system uses observed variables associated with an opinion and a probabilistic model to estimate latent properties of that opinion. With those latent properties, the disclosed system may enable publishers to reliably and comprehensively present object information to interested users.
摘要:
Techniques are provided for the efficient location, processing, and retrieval of local product information derived from web pages generally locatable through form queries submitted to web pages often referred to as the “deep” or “hidden” web. In an embodiment, information such as product information and dealer-location information is located on a web page form such as a dealer-locator form. After location of a suitable web page form, editorial wrapping is performed to create an automated information extraction process. Using the automated information extractor, deep-web crawling is performed. A grid-based extraction of individual business records is performed, and matching and ingestion are performed in conjunction with a business listing database. Finally, metadata tags are added to entries in the business listing database. Metadata tags also may be added to entries in other databases.
摘要:
An improved system and method for generating a maximum utility slate of advertisements for online advertisement auctions is provided. Various utility factors for each advertisement that may be a candidate in a slate of advertisements may be applied within a framework in order to generate a maximum utility slate of advertisements. Either backward or forward dynamic programming may be applied to recursively evaluate the utility of subslates of advertisements in order to generate a maximum utility slate of advertisements. In an embodiment, a network with directed edges and associated costs may be defined, and the longest path may be found in the directed network for constructing a maximum utility slate of advertisements. Various utility factors may be applied for different objectives of an auctioneer and the framework presented may be extended for revenue ordering, exclusion of bidders, ordering slates according to first and second price utilities, and so forth.
摘要:
In one embodiment, training a ranking model comprises: accessing the ranking model and an objective function of the ranking model; accessing one or more preference pairs of objects, wherein for each of the preference pairs of objects comprising a first object and a second object, there is a preference between the first object and the second object with respect to the particular reference, and the first object and the second object each has a feature vector comprising one or more feature values; and training the ranking model by minimizing the objective function using the preference pairs of objects, wherein for each of the preference pairs of objects, a difference between the first feature vector of the first object and the second feature vector of the second object is not calculated.
摘要:
Disclosed are methods and apparatus for segmenting and labeling a collection of token sequences. A plurality of segments of one or more tokens in a token sequence collection are partially labeled with labels from a set of target labels using high precision domain-specific labelers so as to generate a partially labeled sequence collection having a plurality of labeled segments and a plurality of unlabeled segments. Any label conflicts in the partially labeled sequence collection are resolved. One or more of the labeled segments of the partially labeled sequence collection are expanded so as to cover one or more additional tokens of the partially labeled sequence collection. A statistical model, for labeling segments using local token and segment features of the sequence collection, is trained based on the partially labeled sequence collection. This trained model is then used to label the unlabeled segments and the labeled segments of the sequence collection so as to generate a labeled sequence collection. The labeled sequence collection is then stored as structured output records in a database.
摘要:
A taxonomy model is determined with a reduced number of weights. For example, the taxonomy model is a tangible representation of a hierarchy of nodes that represents a hierarchy of classes that, when labeled with a representation of a combination of weights, is usable to classify documents having known features but unknown class. For each node of the taxonomy, the training example documents are processed to determine the features for which there are a sufficient number of training example documents having a class label corresponding to at least one of the leaf nodes of a subtree having that node as a root node. For each node of the taxonomy, a sparse weight vector is determined for that node, including setting zero weights, for that node, those features determined to not appear at least a minimum number of times in a given set of leaf nodes in the sub-tree with that node as a root node. The sparse weight vectors can be learned by solving an optimization problem using a maximum entropy classifier, or a large margin classifier with a sequential dual method (SDM) with margin or slack resealing. The determined sparse weight vectors are tangibly embodied in a computer-readable medium in association with the tangible representation of the nodes of the taxonomy.
摘要:
An improved system and method is provided for training a multi-class support vector machine to select a common subset of features for classifying objects. A multi-class support vector machine generator may be provided for learning classification functions to classify sets of objects into classes and may include a sparse support vector machine modeling engine for training a multi-class support vector machine using scaling factors by simultaneously selecting a common subset of features iteratively for all classes from sets of features representing each of the classes. An objective function using scaling factors to ensure sparsity of features may be iteratively minimized, and features may be retained and added until a small set of features stabilizes. Alternatively, a common subset of features may be found by iteratively removing at least one feature simultaneously for all classes from an active set of features initialized to represent the entire set of training features.
摘要:
The present invention provides methods and systems for binary classification of items. Methods and systems are provided for constructing a machine learning-based and pairwise ranking method-based classification model for binary classification of items as positive or negative with regard to a single class, based on training using a training set of examples including positive examples and unlabelled examples. The model includes only one hyperparameter and only one threshold parameter, which are selected to optimize the model with regard to constraining positive items to be classified as positive while minimizing a number of unlabelled items classified as positive.
摘要:
An improved system and method for scheduling online keyword auctions over multiple time periods subject to budget constraints is provided. A linear programming model of slates of advertisements may be created for predicting the volume and order in which queries may appear throughout multiple time periods for use in allocating bidders to auctions to optimize revenue of an auctioneer. Each slate of advertisements may represent a candidate set of advertisements in order of optimal revenue to an auctioneer. Linear programming using column generation with the keyword as a constraint and a bidder's budget as a constraint may be applied for each time period to generate a column that may be added to a linear programming model of slates of advertisements. Upon receiving a query request, a slate of advertisements for the time period may be output for sending to a web browser for display.
摘要:
An improved system and method for generating a maximum utility slate of advertisements for online advertisement auctions is provided. Various utility factors for each advertisement that may be a candidate in a slate of advertisements may be applied within a framework in order to generate a maximum utility slate of advertisements. Either backward or forward dynamic programming may be applied to recursively evaluate the utility of subslates of advertisements in order to generate a maximum utility slate of advertisements. In an embodiment, a network with directed edges and associated costs may be defined, and the longest path may be found in the directed network for constructing a maximum utility slate of advertisements. Various utility factors may be applied for different objectives of an auctioneer and the framework presented may be extended for revenue ordering, exclusion of bidders, ordering slates according to first and second price utilities, and so forth.