摘要:
The concept of variability pertains to whether users exhibit consistent search interaction patterns, for example, in terms of interaction flow or information targeted. Methods are provided for analyzing variability, and then adapting search-related functionality (e.g., processes and/or interfaces) to account for variability characteristics, for example, to account for predictable search interaction behavior.
摘要:
Structured content and associated metadata from the Web are leveraged to provide specific answer string responses to user questions. The structured content can also be indexed at crawl-time to facilitate searching of the content at search-time. Ranking techniques can also be employed to facilitate in providing an optimum answer string and/or a top K list of answer strings for a query. Ranking can be based on trainable algorithms that utilize feature vectors for candidate answer strings. In one instance, at crawl-time, structured content is indexed and automatically associated with metadata relating to the structured content and the source web page. At search-time, candidate indexed structured content is then utilized to extract an appropriate answer string in response to a user query.
摘要:
Systems and methods that estimate user preference, via automatic interpretation of user behavior. A user behavior component associated with a search engine can automatically interpret collective behavior of users (e.g., web search users). Such feedback component can include user behavior features and predictive models (e.g., from a user behavior component) that are robust to noise, which can be present in observed user interactions with the search results (e.g., malicious and/or irrational user activity.)
摘要:
The present invention leverages iterative transformations of search query strings along with statistics extracted from search query logs and/or web data to provide possible alternative spellings for the search query strings. This provides a spell checking means that can be influenced to provide individualized suggestions for each user. By utilizing search query logs, the present invention can account for substrings not found in a lexicon but still acceptable as a search query of interest. This allows a means to provide a higher quality proposal for alternative spellings, beyond the content of the lexicon. One instance of the present invention operates at a substring level by utilizing word unigram and/or bigram statistics extracted from query logs combined with an iterative search. This provides substantially better spelling alternatives for a given query than employing only substring matching. Other instances can receive input data from sources other than a search query input.
摘要:
A unique ranking system and method that facilitates improving the ranking and ordering of objects to further enhance the quality, accuracy, and delivery of search results in response to a search query. The system and method involve monitoring and tracking an object in terms of the number of times it's been accessed and optionally by whom, when, for how long, and an access rate. The user's interaction with the object can be tracked as well. By tracking the objects, a popularity measure can be determined. Popularity based rankings can be computed based on the popularity measure or some function thereof. The popularity measure can be affected by the access time, who accessed it, access duration or the user's interaction with the object upon access. The popularity based rankings can be utilized by a search component to improve the quality and retrieval of search results.
摘要:
A system and related techniques accepts user search or query terms over of the Internet or other network or connection. In addition to presenting regularly generated search results, according to embodiments of the invention the search engine and related logic may examine the search string for suggested refinements or improvements to the search terms, to attempt to derive improved results or results closer to the user's search intent. According to embodiments of the invention in one regard, the alternative search logic may attempt to extract related or more meaningful search terms from sources including past usage patterns by users, and other data. That alternative search logic may thus examine the user's search terms to determine a substring match to prior searches, for instance stored by the search host for all users. In embodiments, the alternative search logic may likewise present user search extensions or refinement paths selected by prior users running the same search, as an indicator of likely content or source relevance. In further embodiments, the alternative search logic may perform a reverse query lookup to trace queries which resulted in the same Web site or other hit, as the present search and present those other queries as possible alternatives for the user to pursue. These and other search refinements may be performed, taking advantage of usage patterns and other information to improve search quality beyond straightforward spelling-type correction.
摘要:
Systems and methods are described that allow programmatic access to search engine results and query logs in a structured form. The search results can be retrieved from the search engine in an intermediary form that contains the information that is in the HTML pages provided to web browsers (potentially with additional information). This intermediary form can then be broken down on the client machine, using local resources, to assemble the structured objects. The library also provides for caching of the search results. This can be provided both on the local machine and on a remote database. When the results for a query exist in the caches, they can be retrieved from such location instead of querying the search engine. Documents and/or web pages can also be cached. The library can also be directed to operate only from the cache, effectively exposing a local data set instead of the remote search engine.
摘要:
A spell checker based on the noisy channel model has a source model and an error model. The source model determines how likely a word w in a dictionary is to have been generated. The error model determines how likely the word w was to have been incorrectly entered as the string s (e.g., mistyped or incorrectly interpreted by a speech recognition system) according to the probabilities of string-to-string edits. The string-to-string edits allow conversion of one arbitrary length character sequence to another arbitrary length character sequence.
摘要:
A unique multi-stage classification system and method that facilitates reducing human resources or costs associated with text classification while still obtaining a desired level of accuracy is provided. The multi-stage classification system and method involve a pattern-based classifier and a machine learning classifier. The pattern-based classifier is trained on discriminative patterns as identified by humans rather than machines which allow a smaller training set to be employed. Given humans' superior abilities to reason over text, discriminative patterns can be more accurately and more readily identified by them. Unlabeled items can be initially processed by the pattern-based classifier and if no pattern match exists, then the unlabeled data can be processed by the machine learning classifier. By employing the classifiers in this manner, less human involvement is required in the classification process. Even more, classification accuracy is maintained and/or improved.
摘要:
A system that facilitates determining a user's intent given a user search query comprises a search engine that is employed to search over a collection of objects within a data store to retrieve a user search result set. The objects within the result set are associated with queries that were previously utilized to locate such objects. A level of relatedness between the previous queries and the user search query is determined, and previous queries that are associated with a result set that is novel and related to the user search result set are returned to the user.