Abstract:
Methods, systems, and apparatus including computer programs encoded on a computer storage medium, for augmenting search engine index that indexes resources from a collection of resources. In one aspect, a method of augmenting a first search engine index that indexes resources from a first collection of resources includes the actions of identifying a first resource, in the first collection of resources, that is indexed in the first search engine index for which a value of a search engine ranking signal is not available, wherein a search engine uses values of the search engine ranking signal in ranking resources in response to received search queries; processing text from the first resource using a machine learning model, the machine learning model being configured to: process the text to predict a value of the search engine ranking signal for the first resource; and updating the first search engine index by associating the predicted value of the search engine ranking signal with the first resource in the first search engine index.
Abstract:
A system and method for training a search query classifier may be used to develop a large database of search queries used to access inappropriate sensitive or offensive content. A database of well-known and frequently-used search queries for accessing inappropriate sensitive content is expanded by monitoring additional search queries received from a user within a preset time period of the user submitting one of the well-known and frequently-used search queries. The additional search queries received from a user are further evaluated to determine if they are likely associated with inappropriate sensitive or offensive content.
Abstract:
Methods, systems, and media for presenting search results are provided. In accordance with some embodiments, the method comprises: receiving text corresponding to a search query; determining whether a content rating score associated with the search query is below a predetermined threshold, wherein the score is calculated by: identifying a first plurality of search results retrieved using the search query, wherein each search result is associated with one of a plurality of content ratings classes; and calculating the content rating score that is a proportion of search results associated with at least one of the content ratings classes among the first plurality of search results; in response to determining that the content rating score is below the predetermined threshold, identifying a second plurality of search results to be presented based on the search query; and causing the second plurality of search results to be presented.
Abstract:
Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for generating query suggestions based on search data. In one aspect, a method includes receiving, by one or more computers, a first query, determining query refinements based on the first query, generating, from the query refinements, refinement clusters, each refinement cluster corresponding to a particular topic and each refinement cluster including query refinements that are determined to belong to the particular topic to which the refinement cluster corresponds, ranking the refinement clusters, and selecting the refinement cluster that is highest in the ranking relative to other refinement clusters in the ranking as a first search refinement cluster for the first query.
Abstract:
A system and method for providing a search experience in which users are protected from exposure to inappropriate offensive or sensitive content is described. A search system may classify a search query and candidate search results obtained in response to the search query. Based on the classification of the search query and search results, the candidate search results may be modified to generate a set of search results presented to a user such that the presented search results do not include inappropriate sensitive or offensive content.
Abstract:
Methods, systems, and apparatus, including computer program products, for constructing text classifiers. The method includes receiving a collection of candidate phrases for a given topic; filtering the received candidate phrases to remove erroneously included candidate phrases; assigning weights to the candidate phrases including scoring each candidate phrase using an initial classifier and assigning weights to the candidate phrases based on the scores; and generating a linear classifier using the filtered and weighted candidate phrases, where the linear classifier varies the weights for each phrase candidate depending on the length of the document being classified.
Abstract:
Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for determining query refinements using search data. In one aspect, a method includes receiving a first query and a second query each comprising one or more n-grams for a user session, determining a first set of query refinements for the first query, determining a second set of query refinements from the first set of query refinements, each query refinement in the second set of query refinements including at least one n-gram that is similar to an n-gram from the first query and at least on n-gram that is similar to an n-gram from the second query, scoring each query refinement in the second set of query refinements, selecting a third query from a group consisting of the second set of query refinements and the second query, and providing the third query as input to a search operation.
Abstract:
Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for determining query refinements using search data. In one aspect, a method includes receiving a first query and a second query each comprising one or more n-grams for a user session, determining a first set of query refinements for the first query, determining a second set of query refinements from the first set of query refinements, each query refinement in the second set of query refinements including at least one n-gram that is similar to an n-gram from the first query and at least on n-gram that is similar to an n-gram from the second query, scoring each query refinement in the second set of query refinements, selecting a third query from a group consisting of the second set of query refinements and the second query, and providing the third query as input to a search operation.
Abstract:
A system and method for generating a road network based on satellite imagery. Plural pixels corresponding to satellite imagery of a region are obtained. For each of the plural pixels, a probability value corresponding to the probability that the pixel belongs within the road network is calculated. A grayscale image is formed based on the calculated probability values. Plural curves are produced based on the grayscale image, wherein the producing of each curve includes positioning a shape on the grayscale image so that an average intensity of the grayscale image covered by the shape exceeds a preset threshold, moving the shape about the grayscale image while the average intensity is maintained, and tracking the movement of the shape to produce the curve. A planar-connected graph is generated by connecting at least portions of the plural curves. The planar-connected graph corresponds to the road network.
Abstract:
Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for generating query suggestions based on search data. In one aspect, a method includes receiving, by one or more computers, a first query, determining query refinements based on the first query, generating, from the query refinements, refinement clusters, each refinement cluster corresponding to a particular topic and each refinement cluster including query refinements that are determined to belong to the particular topic to which the refinement cluster corresponds, ranking the refinement clusters, and selecting the refinement cluster that is highest in the ranking relative to other refinement clusters in the ranking as a first search refinement cluster for the first query.