摘要:
Methods, systems, and apparatus, including computer programs encoded on computer storage media, for correcting entity names. One method includes receiving texts and deriving a plurality of name-context pairs from the texts. The method further includes calculating a context consistency measure for each name-context pair and storing context-entity name data representing the name-context pairs. Another method includes identifying an entity name and one or more context terms from a query and generating candidate names for the entity name. The method further includes determining a score for each of the candidate names, selecting a number of top scoring candidate names, and using the selected candidate names to respond to the query.
摘要:
A system for generating a model is provided. The system generates, or selects, candidate conditions and generates, or otherwise obtains, statistics regarding the candidate conditions. The system also forms rules based, at least in part, on the statistics and the candidate conditions and selectively adds the rules to the model.
摘要:
Systems and methods that improve search rankings for a search query by using data associated with queries related to the search query are described. In one aspect, a search query is received, a related query related to the search query is determined, an article (such as a web page) associated with the search query is determined, and a ranking score for the article based at least in part on data associated with the related query is determined. Several algorithms and types of data associated with related queries useful in carrying out such systems and methods are described.
摘要:
A learning model is built on a combination of advertiser, publisher and user data. The learning model can be applied to all advertisers in an advertising system. The learning model provides predicted conversion rates for a given advertisement (“ad”) appearing on different publisher networks. A predicted conversion rate represents the probability that a click on a given ad appearing on a given publisher will lead to a conversion. The predicted conversion rates are used to generate a multiplier. The multiplier is used to automatically adjust the advertiser's bid (e.g., maximum cost-per-click (CPC)) for the given ad prior to an auction for the ad. Adjusting the advertiser's bid equalizes a cost-per-conversion among the publishers for the ad.
摘要:
Systems and methods that improve search rankings for a search query by using data associated with queries related to the search query are described. In one aspect, a search query is received, a population associated with the search query is determined, an article (such as a webpage) associated with the search query is determined, and a ranking score for the article based at least in part on data associated with the population is determined. Algorithms and types of data associated with a population useful in carrying out such systems and methods are described.
摘要:
A system may determine an extent to which a document is selected when the document is included in a set of search results, generate a score for the document based, at least in part, on the extent to which the document is selected when the document is included in a set of search results; and rank the document with regard to at least one other document based, at least in part, on the score.
摘要:
A system for generating a model is provided. The system generates, or selects, candidate conditions and generates, or otherwise obtains, statistics regarding the candidate conditions. The system also forms rules based, at least in part, on the statistics and the candidate conditions and selectively adds the rules to the model.
摘要:
A stopword detection component detects stopwords (also stop-phrases) in search queries input to keyword-based information retrieval systems. Potential stopwords are initially identified by comparing the terms in the search query to a list of known stopwords. Context data is then retrieved based on the search query and the identified stopwords. In one implementation, the context data includes documents retrieved from a document index. In another implementation, the context data includes categories relevant to the search query. Sets of retrieved context data are compared to one another to determine if they are substantially similar. If the sets of context data are substantially similar, this fact may be used to infer that the removal of the potential stopword(s) is not material to the search. If the sets of context data are not substantially similar, the potential stopword can be considered material to the search and should not be removed from the query.
摘要:
A system automatically creates a list from items in existing lists. The system receives one or more example items corresponding to the list and assigns weights to the items in the existing lists based on the one or more example items. The system then forms the list based on the items and the weights assigned to the items.
摘要:
A system may determine an extent to which a document is selected when the document is included in a set of search results, generate a score for the document based, at least in part, on the extent to which the document is selected when the document is included in a set of search results; and rank the document with regard to at least one other document based, at least in part, on the score.