摘要:
User behavior modeling can include determining temporal- or time-based actions performed by various users. From the mined temporal-based user actions, future actions can be predicted. Certain implementations include providing information and/or services based on the predicted future actions. Some implementations, include providing relevant information, services, and/or goods regarding the predicted future action.
摘要:
A method of generating training data for a search engine begins by retrieving log data pertaining to user click behavior. The log data is analyzed based on a click model that includes a parameter pertaining to a user intent bias representing the intent of a user in performing a search in order to determine a relevance of each of a plurality of pages to a query. The relevance of the pages is then converted into training data.
摘要:
A “General Click Model” (GCM) is constructed using a Bayesian network that is inherently capable of modeling “tail queries” by building the model on multiple attribute values that are shared across queries. More specifically, the GCM learns and predicts user click behavior towards URLs displayed on a query results page returned by a search engine. Unlike conventional click modeling approaches that learn models based on individual queries, the GCM learns click models from multiple attributes, with the influence of different attribute values being measured by Bayesian inference. This provides an advantage in learning that enables the GCM to achieve improved generalization and results, especially for tail queries, than conventional click models. In addition, most conventional click models consider only position and the identity of URLs when learning the model. In contrast, the GCM considers more session-specific attributes in making a final prediction for anticipated or expected user click behaviors.
摘要:
User behavior modeling can include determining temporal- or time-based actions performed by various users. From the mined temporal-based user actions, future actions can be predicted. Certain implementations include providing information and/or services based on the predicted future actions. Some implementations, include providing relevant information, services, and/or goods regarding the predicted future action.
摘要:
Data from a click log may be used to generate training data for a search engine. User click behavior and user post-click behavior may be used to assess the relevance of a page to a query. Labels for training data may be generated based on data from the click log. The labels may pertain to the relevance of a page to a query. For example, user post-click behavior that may be examined includes the amount of time that a user remains on a target page when a user clicks one of the search results.
摘要:
A “General Click Model” (GCM) is constructed using a Bayesian network that is inherently capable of modeling “tail queries” by building the model on multiple attribute values that are shared across queries. More specifically, the GCM learns and predicts user click behavior towards URLs displayed on a query results page returned by a search engine. Unlike conventional click modeling approaches that learn models based on individual queries, the GCM learns click models from multiple attributes, with the influence of different attribute values being measured by Bayesian inference. This provides an advantage in learning that enables the GCM to achieve improved generalization and results, especially for tail queries, than conventional click models. In addition, most conventional click models consider only position and the identity of URLs when learning the model. In contrast, the GCM considers more session-specific attributes in making a final prediction for anticipated or expected user click behaviors.
摘要:
Techniques for training a non-linear support vector machine utilizing a stochastic gradient descent algorithm are provided. The computations of the stochastic gradient descent algorithm are parallelized via a number of processors. Calculations of the stochastic gradient descent algorithm on a particular processor may be combined according to a packing strategy before communicating the results of the calculations with the other processors.
摘要:
Techniques for training a non-linear support vector machine utilizing a stochastic gradient descent algorithm are provided. The computations of the stochastic gradient descent algorithm are parallelized via a number of processors. Calculations of the stochastic gradient descent algorithm on a particular processor may be combined according to a packing strategy before communicating the results of the calculations with the other processors.
摘要:
A smart user-centric information aggregation system allows a user to define a region of content displayed in a display of a device and performs information aggregation on behalf of the user. The smart user-centric information aggregation system searches, aggregates and groups information related to content included in the region of content for the user while the user can continue to perform his/her original course of actions without interruption. After finding information related to the desired content, the smart user-centric information aggregation system may notify the user and present the found information to the user upon receiving confirmation from the user. The smart user-centric information aggregation system may continue to find new related information and update the presentation with the newly found information periodically, in some instances without user intervention or input.
摘要:
Described is a technology in which new words (including a phrase or set of Chinese characters) are mined from a query log. The new words may be added to (or otherwise supplement) an IME dictionary. A set of candidate queries may be selected from the log based upon market (e.g., the Chinese market) and/or by language. From this set, various filtering steps are performed to locate only new words that are frequently in used. For example, only frequent queries are kept for further processing, which may include filtering out queries based on length (e.g., less than two or greater than eight Chinese characters), and/or filtering out queries based on too many stop-words in the query. Processing may also include filtering out a query that is a substring of a larger query, or vice-versa. Also described is Pinyin-based clustering and filtering, and filtering out queries already handled in the dictionary.