摘要:
In one embodiment, access a search query comprising one or more query words, at least one of the query words representing one or more query concepts; access a network document identified for a search query by a search engine, the network document comprising one or more document words, at least one of the document words representing one or more document concepts; semantic-text match the search query and the network document to determine one or more negative semantic-text matches; and construct one or more negative features based on the negative semantic-text matches.
摘要:
A method for handling abbreviations in web queries includes building a dictionary of possible word expansions for potential abbreviations related to query terms received and anticipated to be received by a search engine; accepting a query including an abbreviation from a searching user, where a probability of finding a most probably-correct expansion in the dictionary is a first probability, and a probability that the expansion is the abbreviation itself is a second probability; determining a ratio between the first and second probabilities; expanding the abbreviation in accordance with the most probably-correct expansion when the ratio is above a first threshold value; and highlighting the abbreviation with a suggested expansion of the most probably-correct expansion for the user so that the user may accept the suggested expansion when the ratio is between a second, lower threshold value and the first threshold value.
摘要:
A system and method for improved search relevance using proximity boosting. A query for a web search is received from a user, via a network, wherein the query comprises a plurality of query tokens. One or more concepts are identified in the query wherein each of concepts comprises at least two query tokens. A relative concept strength is determined for each of the identified concepts. The query is then rewritten for submission to a search engine wherein for each of the one or more concepts, a syntax rule associated with the respective relative concept strength of the concept is applied to the query tokens comprising the concept such that the rewritten query represents the one or more concepts whereby the proximity of the one or more concepts in a search result returned by the search engine to the user in response to the rewritten query is boosted.
摘要:
A system and method for ranking web searches with quantified semantic features. A query for a web search is received from a user. The query is segmented and tagged into one or more linguistic segments using linguistic analysis. At least some of the linguistic segments are tagged with a linguistic type. A query execution plan is generated comprising the linguistic segments and, for each of the linguistic segments tagged with a linguistic type, at least one tag attribute comprising at least one domain specific feature of the linguistic type. A search is performed for documents matching the query. Each of the documents is scored for each of the linguistic segments of the query execution plan using the tag attributes of the respective linguistic segment. The documents are ranked using a function that uses the scores of the documents. A ranked list of the documents is transmitted back to the user.
摘要:
An aggregate ranking model is generated, which comprises a general ranking model and one or more topical training models. Each topical ranking model is associated with a topic, or topic class, and for use in ranking search result items determined to belong to the topic, or topic class. As one example, the topical ranking model is trained using a set of topical training data, e.g., training data determined to belong to the topic, or topic class, a general ranking model and a residue, or error, determined from a general ranking generated by the general ranking model for the topical training data, with the topical ranking model being trained to minimize the general ranking model's error in the aggregate ranking model.
摘要:
A method is provided for selecting relevant documents returned from a search query. When a search engine finds search terms in documents, the document score is based on the frequency of the occurrence of those terms, the category of the term, and the section of the document in which the term is found. Each (category type, document section) pair is assigned a weight that is used to modify the contribution of term frequency. The weights are determined in an offline process using historical data and human validation. Through this empirical process, the weight assignments are made to correlate high relevance scores with documents that humans would find relevant to a search query.
摘要:
Techniques are described for automatically determining which terms in a search query may be augmented by contextually similar terms such that more relevant results can be displayed to a user. Contextually similar words are determined based on training data, including a web corpus and a query log. Once contextually similar words are determined, they may be inserted into a search query and used to find more relevant results. Consequently, documents that contain helpful information but may not have exact word matches may be found more readily by a search engine.
摘要:
Computer-enabled methods, apparatus, and computer-readable media are provided for verifying that a given network name, such as a URL, is an official, e.g., registered, approved, or otherwise officially recognized, network name that refers to or identifies a principal, such as a business. These techniques involve receiving a principal name and a given network name, receiving at least one feature attribute from at least one database of feature attributes, wherein the at least one feature attribute comprises a characteristic of the principal name or a characteristic of the network name, and invoking a logistic regression method to generate a probability, based upon the at least one feature attribute, that the given network name is an official network name for the principal name. The logistic regression method may include a gradient boosting tree model that generates the probability based upon the at least one feature attribute.
摘要:
A computerized method is provided for electronically directing a call to a class, such that an utterance spoken by a speaker and received by a call-routing system is classified by the call-routing system as being associated with the class, such that the call-routing system includes a speech-recognition module, a feature-extraction module, and a classification module. The method includes extracting features from recognized speech; weighting elements of a feature vector with respective speech-recognition scores, wherein each weighting element is associated with one of the features; ranking classes to which the features are associated; and electronically directing the call to a highest-ranking class.