摘要:
A system and method for improved search relevance using proximity boosting. A query for a web search is received from a user, via a network, wherein the query comprises a plurality of query tokens. One or more concepts are identified in the query wherein each of concepts comprises at least two query tokens. A relative concept strength is determined for each of the identified concepts. The query is then rewritten for submission to a search engine wherein for each of the one or more concepts, a syntax rule associated with the respective relative concept strength of the concept is applied to the query tokens comprising the concept such that the rewritten query represents the one or more concepts whereby the proximity of the one or more concepts in a search result returned by the search engine to the user in response to the rewritten query is boosted.
摘要:
One particular embodiment accesses a first set of search queries comprising one or more first search queries; extracts one or more features based on the first set of search queries, trains a search-query classifier using the features; accesses a second search query provided by a user; determines whether the second search query has implicit and general local intent using the search-query classifier; if the second search query has implicit and general local intent, then determines a location associated with the user; and identifies a search result in response to the second search query based at least in part on the location associated with the user; and presents the search result to the user.
摘要:
Computer-enabled methods, apparatus, and computer-readable media are provided for verifying that a given network name, such as a URL, is an official, e.g., registered, approved, or otherwise officially recognized, network name that refers to or identifies a principal, such as a business. These techniques involve receiving a principal name and a given network name, receiving at least one feature attribute from at least one database of feature attributes, wherein the at least one feature attribute comprises a characteristic of the principal name or a characteristic of the network name, and invoking a logistic regression method to generate a probability, based upon the at least one feature attribute, that the given network name is an official network name for the principal name. The logistic regression method may include a gradient boosting tree model that generates the probability based upon the at least one feature attribute.
摘要:
In one embodiment, access a search query comprising one or more query words, at least one of the query words representing one or more query concepts; access a network document identified for a search query by a search engine, the network document comprising one or more document words, at least one of the document words representing one or more document concepts; semantic-text match the search query and the network document to determine one or more negative semantic-text matches; and construct one or more negative features based on the negative semantic-text matches.
摘要:
A method for handling abbreviations in web queries includes building a dictionary of a plurality of possible word expansions for a plurality of potential abbreviations related to query terms received or anticipated to be received by a search engine; accepting a query including an abbreviation; expanding the abbreviation into one of the plurality of word expansions if a probability that the expansion is correct is above a threshold value, wherein the probability is determined by taking into consideration a context of the abbreviation within the query, wherein the context including at least anchor text; and sending the query with the expanded abbreviation to the search engine to generate a search results page related to the query.
摘要:
A method for handling abbreviations in web queries includes building a dictionary of possible word expansions for potential abbreviations related to query terms received and anticipated to be received by a search engine; accepting a query including an abbreviation from a searching user, where a probability of finding a most probably-correct expansion in the dictionary is a first probability, and a probability that the expansion is the abbreviation itself is a second probability; determining a ratio between the first and second probabilities; expanding the abbreviation in accordance with the most probably-correct expansion when the ratio is above a first threshold value; and highlighting the abbreviation with a suggested expansion of the most probably-correct expansion for the user so that the user may accept the suggested expansion when the ratio is between a second, lower threshold value and the first threshold value.
摘要:
A method for handling abbreviations in web queries includes building a dictionary of a plurality of possible word expansions for a plurality of potential abbreviations related to query terms received or anticipated to be received by a search engine; accepting a query including an abbreviation; expanding the abbreviation into one of the plurality of word expansions if a probability that the expansion is correct is above a threshold value, wherein the probability is determined by taking into consideration a context of the abbreviation within the query, wherein the context including at least anchor text; and sending the query with the expanded abbreviation to the search engine to generate a search results page related to the query.
摘要:
A method for handling abbreviations in web queries includes building a dictionary of possible word expansions for potential abbreviations related to query terms received and anticipated to be received by a search engine; accepting a query including an abbreviation from a searching user, where a probability of finding a most probably-correct expansion in the dictionary is a first probability, and a probability that the expansion is the abbreviation itself is a second probability; determining a ratio between the first and second probabilities; expanding the abbreviation in accordance with the most probably-correct expansion when the ratio is above a first threshold value; and highlighting the abbreviation with a suggested expansion of the most probably-correct expansion for the user so that the user may accept the suggested expansion when the ratio is between a second, lower threshold value and the first threshold value.
摘要:
In one embodiment, access a search query comprising one or more query words, at least one of the query words representing one or more query concepts; access a network document identified for a search query by a search engine, the network document comprising one or more document words, at least one of the document words representing one or more document concepts; semantic-text match the search query and the network document to determine one or more negative semantic-text matches; and construct one or more negative features based on the negative semantic-text matches.
摘要:
A method for normalizing query words in web search includes populating a dictionary with join and split candidates and corresponding joined and split words from an aggregate of query logs; determining a confidence score for join and split candidates, a highest confidence score for each being characterized in the dictionary as must-join and must-split, respectively; accepting queries with words amenable to being split or joined, or amenable to an addition or deletion of a hyphen or an apostrophe; generating, based on the accepted queries, split candidates obtained from the dictionary, and candidates of join, hyphen, or apostrophe algorithmically; and submitting to a search engine the generated possible candidates characterized as must-join or must-split in the dictionary, to improve search results returned in response to the queries; applying a language dictionary to generated candidates not characterized as must-split or must-join, to rank them, and submitting those highest-ranked to the search engine.