摘要:
This document describes tools for adjusting anchor text weight to provide more relevant search engine results. Specifically, these tools take advantage of a site-relationship model to consider relationships not only between an anchor text source site and a destination page but also relationships between multiple anchor text source sites to improve web searches. Consideration of these relationships aids in determining a new an anchor text weight, which in turn results in more relevant search results.
摘要:
This document describes tools for adjusting anchor text weight to provide more relevant search engine results. Specifically, these tools take advantage of a site-relationship model to consider relationships not only between an anchor text source site and a destination page but also relationships between multiple anchor text source sites to improve web searches. Consideration of these relationships aids in determining a new an anchor text weight, which in turn results in more relevant search results.
摘要:
Computer-readable media, computer systems, and computing devices facilitate enhancing a web index with uniform resource locator (URL)/non-encoding character (NEC) word pairs to facilitate relevance ranking of search results provided in response to a search query that includes NEC words. URLs are received from web pages and substrings extracted therefrom. Additional elements are received from the web page, word-broken into sequences of NEC words, and the NEC words are converted into encoding-language representations which are matched against the URL substrings to identify candidate URL/NEC pairs for utilization in relevance ranking.
摘要:
Computer-readable media, computer systems, and computing devices facilitate enhancing a web index with uniform resource locator (URL)/non-encoding character (NEC) word pairs to facilitate relevance ranking of search results provided in response to a search query that includes NEC words. URLs are received from web pages and substrings extracted therefrom. Additional elements are received from the web page, word-broken into sequences of NEC words, and the NEC words are converted into encoding-language representations which are matched against the URL substrings to identify candidate URL/NEC pairs for utilization in relevance ranking.
摘要:
A method using a RankBoost-based algorithm to automatically select features for further ranking model training is provided. The method reiteratively applies a set of ranking candidates to a training data set comprising a plurality of ranking objects having a known pairwise ranking order. Each round of iteration applies a weight distribution of ranking object pairs, yields a ranking result by each ranking candidate, identifies a favored ranking candidate for the round based on the ranking results, and updates the weight distribution to be used in next iteration round by increasing weights of ranking object pairs that are poorly ranked by the favored ranking candidate. The method then infers a target feature set from the favored ranking candidates identified in the iterations.
摘要:
A method using a RankBoost-based algorithm to automatically select features for further ranking model training is provided. The method reiteratively applies a set of ranking candidates to a training data set comprising a plurality of ranking objects having a known pairwise ranking order. Each round of iteration applies a weight distribution of ranking object pairs, yields a ranking result by each ranking candidate, identifies a favored ranking candidate for the round based on the ranking results, and updates the weight distribution to be used in next iteration round by increasing weights of ranking object pairs that are poorly ranked by the favored ranking candidate. The method then infers a target feature set from the favored ranking candidates identified in the iterations.
摘要:
Search results provided by a search engine (e.g., for the Internet) are improved and/or made more accurate by addressing the limited availability of human labeled training data for certain domains (e.g., languages other than English, within certain date ranges, corresponding to queries over a certain length, etc.). More particularly, a ranking model trained on in-domain data, for which a small amount of human labeled training data (e.g., query/URL pairs) is available (e.g., languages other than English) is adjusted based upon out-domain data, for which a large amount of human labeled training data (e.g., query/URL pairs) is available (e.g., English). Thus, even though the resulting adapted in-domain ranking model is used in the context of in-domain data (e.g., non-English) to provide search results, the search results are improved because they are influenced by an abundance of, albeit out-domain, human labeled training data.