摘要:
Architecture that mines intent of a query from search log data. For example, for a given query, the intent, the major URLs for the intent, and intent attributes, are found. The input is search log data and the output is a database that contains the intent of queries mined from the log data. Data mining techniques are employed to discover major intents of queries in the click-through log data of a search engine. For each query, its expanded queries are created and utilized, as well as co-clicks of the original query and expanded queries in the log data. For each query, clustering is performed on the co-click data of the query and expanded queries to find the major intents of the query.
摘要:
A system and method for enterprise search includes one or more computer-readable media storing computer-executable instructions that, when executed on one or more processors that perform acts including extracting one or more of term data, personal data and metadata from one or more predetermined resources; retrieving a set of information derived from the extracted term data, personal data and metadata responsive to a query; and receiving feedback responsive to the set of information, the feedback augmenting at least one of the one or more predetermined resources.
摘要:
An n-gram and/or phrase extraction model may be trained based at least in part on search-focused information mined from a search-query log. The n-gram and/or phrase extraction model may extract key n-grams and/or phrases from retrieved electronic documents based at least in part on features and/or characteristics of the key n-grams and/or phrases and based at least in part on features and/or characteristics of the search-focused information. The extracted key n-grams and/or phrases may be weighted. A relevancy ranking model may be trained based at least in part on the information extracted by the n-gram and/or phrase extraction model. The relevancy ranking model may provide a relevancy ranking score for electronic documents listed in a search result based at least in part on weights of extracted key n-grams and/or phrases.
摘要:
Various technologies and techniques are disclosed for calculating authorship dates for a document. A portion of a document to select to look for possible authorship dates is determined. The possible authorship dates are extracted from the portion of the document. A revised authorship date of the document is generated using a neural network. The revised authorship date is returned to an application or process that requested the date.
摘要:
Techniques and tools described herein mine social information from a source and store the social information in a database. Responsive to a search object, the techniques search the stored social information and determine social relationships. The techniques further provide, via a graphical user interface, the social relationships determined from the social information stored in the database. In several embodiments, the techniques enable social relationship feedback.
摘要:
Architecture that extracts author information from general documents and uses the author information for search results ranking. The architecture performs automatic author value extraction and makes the extracted value available at index time for subsequent use at query processing and results ranking. Machine learning (e.g., a perceptron algorithm) is employed and a set of input features for the perceptron algorithm utilized for author value extraction. The extracted author value is converted into a feature for input a ranking function for generating a ranking score for each document. The input features can also be weighted according to weighting criteria.
摘要:
A system and method for enterprise search includes one or more computer-readable media storing computer-executable instructions that, when executed on one or more processors that perform acts including extracting one or more of term data, personal data and metadata from one or more predetermined resources; retrieving a set of information derived from the extracted term data, personal data and metadata responsive to a query; and receiving feedback responsive to the set of information, the feedback augmenting at least one of the one or more predetermined resources.
摘要:
A model generated from search log data predicts a hidden state based on a query to determine a context of the query, such as for providing re-ranked search results, query suggestions and/or URL recommendations.
摘要:
A suffix-tree index may be constructed from search engine search logs. This suffix-tree is scalable and suitable for use in a distributed computing environment. Data mining against the data may proceed with functions including a forward search, backward search, and/or query session retrieval.
摘要:
Techniques described herein describe a context-aware query suggestion process. Context of a current query may be calculated by analyzing a sequence of previous queries. Historical search data may be mined to generate groups of query suggestion candidates. Using the context of the current query, the current query may be matched with the groups of query suggestion candidates to find a matching query suggestion candidate, which may be provided to the user.