摘要:
The search intent co-learning technique described herein learns user search intents from rule-based training data and denoises and debiases this data. The technique generates several sets of biased and noisy training data using different rules. It trains each of a set of classifiers using different training data sets independently. The classifiers are then used to categorize the training data as well as any unlabeled data. The classified data confidently classified by one classifier is added to other training data sets, and the wrongly classified data is filtered out from the training data sets, so as to create an accurate training data set with which to train a classifier to learn a user's intent for submitting a search query string or targeting a user for on-line advertising based on user behavior.
摘要:
Techniques for identifying similar queries based on their overall similarity and partial similarity of time series of frequencies of the queries are provided. To identify queries that are similar to a target query, the query analysis system generates, for each query, an overall similarity score for that query and the target query based on the time series of the query and the target query. The query analysis system also generates, for each query, partial similarity scores for the query and the target query based on various time sub-series of the overall time series of the queries. The query analysis system then identifies queries as being similar to the target query based on the overall similarity scores and the partial similarity scores of the queries.
摘要:
Representing queries and determining similarity of queries based on an autoregressive integrated moving average (“ARIMA”) model is provided. A query analysis system represents each query by its ARIMA coefficients. The query analysis system may estimate the frequency information for a desired past or future interval based on frequency information for some initial intervals. The query analysis system may also determine the similarity of a pair of queries based on the similarity of their ARIMA coefficients. The query analysis system may use various metrics, such as a correlation metric, to determine the similarity of the ARIMA coefficients.
摘要:
Techniques for analyzing and modeling the frequency of queries are provided by a query analysis system. A query analysis system analyzes frequencies of a query over time to determine whether the query is time-dependent or time-independent. The query analysis system forecasts the frequency of time-dependent queries based on their periodicities. The query analysis system forecasts the frequency of time-independent queries based on causal relationships with other queries. To forecast the frequency of time-independent queries, the query analysis system analyzes the frequency of a query over time to identify significant increases in the frequency, which are referred to as “query events” or “events.” The query analysis system forecasts frequencies of time-independent queries based on queries with events that tend to causally precede events of the query to be forecasted.
摘要:
Techniques for analyzing and modeling the frequency of queries are provided by a query analysis system. A query analysis system analyzes frequencies of a query over time to determine whether the query is time-dependent or time-independent. The query analysis system forecasts the frequency of time-dependent queries based on their periodicities. The query analysis system forecasts the frequency of time-independent queries based on causal relationships with other queries. To forecast the frequency of time-independent queries, the query analysis system analyzes the frequency of a query over time to identify significant increases in the frequency, which are referred to as “query events” or “events.” The query analysis system forecasts frequencies of time-independent queries based on queries with events that tend to causally precede events of the query to be forecasted.
摘要:
Some examples include displaying a user interface that includes attributes and entities that are determined to be related to an input entity identified by a user. Further, some implementations include displaying a structured data table that identifies attribute values associated with the input entity and selected related entities.
摘要:
A smart user-centric information aggregation system allows a user to define a region of content displayed in a display of a device and performs information aggregation on behalf of the user. The smart user-centric information aggregation system searches, aggregates and groups information related to content included in the region of content for the user while the user can continue to perform his/her original course of actions without interruption. After finding information related to the desired content, the smart user-centric information aggregation system may notify the user and present the found information to the user upon receiving confirmation from the user. The smart user-centric information aggregation system may continue to find new related information and update the presentation with the newly found information periodically, in some instances without user intervention or input.
摘要:
Techniques are described for generating structured information from semi-structured web pages, and retrieving the structured knowledge in response to a user query that indicates a query intent. The structured information is automatically extracted offline from semi-structured web pages, through the use of an auto wrapper solution that is noise tolerant, scalable, and automatic. The structured information is stored in a knowledge base, and provided in response to a user search query that indicates a query intent. Extraction of structured information may also include clustering of pages based on their measured similarities. The clusters may be determined based on similar elements in the tag path text data of the pages. A minimum size threshold may be applied to the clusters.
摘要:
Techniques for identifying similar queries based on their overall similarity and partial similarity of time series of frequencies of the queries are provided. To identify queries that are similar to a target query, the query analysis system generates, for each query, an overall similarity score for that query and the target query based on the time series of the query and the target query. The query analysis system also generates, for each query, partial similarity scores for the query and the target query based on various time sub-series of the overall time series of the queries. The query analysis system then identifies queries as being similar to the target query based on the overall similarity scores and the partial similarity scores of the queries.
摘要:
Transfer of learning trains a new domain for the classification of search queries according to different tasks, as well as the generation of a corresponding domain-specific query classifier that may be used to classify the search queries according to the different tasks in the new domain. The transfer of learning may include preparing a new domain to receive classification knowledge from one or more source domains by populating the new domain with preliminary query patterns extracted for a search engine log. The transfer of learning may further include preparing the classification knowledge in each source domain for transfer to the new domain. The classification knowledge in each source domain may then be transferred to the new domain.