摘要:
A method and system for generating a search result for a query of hierarchically organized documents based on retrieval of subtrees that are key resources for topic distillation is provided. The retrieval system may identify documents relevant to a query using conventional searching techniques. The retrieval system then calculates a subtree feature for subtrees that have an identified document as their root. After the retrieval system calculates the subtree feature for the subtrees, the retrieval system may generate a subtree relevance score for each subtree based on its subtree feature. The retrieval system may then order the identified documents based on their corresponding subtree relevances.
摘要:
An importance system calculates the importance of pages using a conditional Markov random walk model rather than a conventional Markov random walk model. The importance system calculates the importance of pages factoring in the importance of sites that contain those pages. The importance system may factor in the importance of sites based on the strength of the correlation of the importance of a page to the importance of a site. The strength of the correlation may be based upon the depth of the page within the site. The importance system may iteratively calculate the importance of the pages using “conditional” transition probabilities. During each iteration, the importance system may recalculate the conditional transition probabilities based on the importance of sites that are derived from the recalculated importance of pages during the iteration.
摘要:
A method and system for generating a classifier to classify sub-objects of an object based on a relationship between sub-objects is provided. The classification system provides training sub-objects along with the actual classification of each training sub-object. The classification system may iteratively train sub-classifiers based on feature vectors representing the features of each sub-object, the actual classification of the sub-object, and a weight associated with the sub-object. After a sub-classifier is trained, the classification system classifies the training sub-objects using the trained sub-classifier. The classification system then adjusts the classifications based on relationships between training sub-objects. The classification system assigns a weight for the sub-classifier and weight for each sub-object based on the accuracy of the adjusted classifications.
摘要:
A method and system for augmenting a training set used to train a classifier of documents is provided. The augmentation system augments a training set with training data derived from features of documents based on a document hierarchy. The training data of the initial training set may be derived from the root documents of the hierarchies of documents. The augmentation system generates additional training data that includes an aggregate feature that represents the overall characteristics of a hierarchy of documents, rather than just the root document. After the training data is generated, the augmentation system augments the initial training set with the newly generated training data.
摘要:
The probability that a user clicks on an online advertisement may be dependent on an attractiveness of the online advertisement. In determining such click probability, an advertisement attractiveness model for estimating an attractiveness of an online advertisement to users may be developed. A click behavior model is then created by combining the advertisement attractiveness model with a relevance model. The relevance model may be used for estimating relevance between the online advertisement and a search query. The click behavior model may be applied to features extracted from the online advertisement to calculate a click probability for the online advertisement.
摘要:
A method and system is provided for calculating importance of documents based on transition probabilities from a source document to a target document based on looking ahead to information content of target documents of the source document. A look-ahead importance system generates transition probabilities of transitioning between any pair of source and target documents based on analysis of links to target documents of the source document. The system may calculate the transition probabilities based on the number of links on documents a look-ahead distance away. The system then solves for the stationary probabilities of the transition probabilities. The stationary probabilities represent the importance of the documents.
摘要:
Implementations for providing menu-based advertising are disclosed. A search engine front-end determines non-search engine information pages that are relevant to the user input based on user input entered into a search query field on a search page. A suggestion menu is caused to be displayed on a search page. The suggestion menu includes interactive elements that are interactive to cause a client device to retrieve the non-search engine information pages associated with the interactive elements. The interactive elements may be advertisements, and the suggestion menu may also be used to display search query suggestions.
摘要:
Embodiments for caching and accessing Directed Acyclic Graph (DAG) data to and from a computing device of a DAG distributed execution engine during the processing of an iterative algorithm. In accordance with one embodiment, a method includes processing a first subgraph of the plurality of subgraphs from the distributed storage system in the computing device. The first subgraph being processed with associated input values in the computing device to generate first output values in an iteration. The method further includes storing a second subgraph in a cache of the device. The second subgraph being a duplicate of the first subgraph. Moreover, the method also includes processing the second subgraph with the first output values to generate second output values if the device is to process the first subgraph in each of one or more subsequent iterations.
摘要:
A method and system for distributed training of a hierarchical classifier for classifying documents using a classification hierarchy is provided. A training system provides training data that includes the documents and classifications of the documents within the classification hierarchy. The training system distributes the training of the classifiers of the hierarchical classifier to various agents so that the classifiers can be trained in parallel. For each classifier, the training system identifies an agent that is to train the classifier. Each agent then trains its classifiers.
摘要:
A method and system for determining a ranking of web sites based on an aggregation of rankings of the web pages within the web sites is provided. A ranking system identifies for each web site a stationary distribution of a stochastic complement of the transition probabilities between web pages of the web site. The ranking system then calculates transition probabilities between web sites based on the web page transition probabilities weighted by the stationary distribution of the stochastic complements. The ranking system then calculates the stationary distribution of the transition probabilities of the web sites to represent a ranking of the web sites.