摘要:
Systems and methods for related term suggestion are described. In one aspect, term clusters are generated as a function of calculated similarity of term vectors. Each term vector having been generated from search results associated with a set of high frequency of occurrence (FOO) historical queries previously submitted to a search engine. Responsive to receiving a term/phrase from an entity, the term/phrase is evaluated in view of terms/phrases in the term clusters to identify one or more related term suggestions.
摘要:
A system for augmenting click-through data with latent information present in the click-through data for use in generating search results that are better tailored to the information needs of a user submitting a query is provided. The augmentation system creates a three-dimensional matrix with the dimensions of users, queries, and documents. The augmentation system then performs a three-order singular value decomposition of the three-dimensional matrix to generate a three-dimensional core singular value matrix and a left singular matrix for each dimension. The augmentation system finally multiplies the three-dimensional core singular value matrix by the left singular matrices to generate an augmented three-dimensional matrix that explicitly contains the information that was latent in the un-augmented three-dimensional matrix.
摘要:
A method and system for calculating the significance of a sentence within a document is provided. The summarization system calculates the significance of the sentences of a document and selects the most significant sentences as the summary of the document. The summarization system calculates the significance of a sentence based on the “important” words of the document that are contained within the sentence. The summarization system calculates the importance of words of the document using various scoring techniques and then combines the scores to classify a word as important or not important. The summarization system can then be used to identify significant sentences of the document based on the important words that a sentence contains and select significant sentences as a summary of the document.
摘要:
A method and system for adapting search results of a query to the information needs of the user submitting the query is provided. A search system analyzes click-through triplets indicating that a user submitted a query and that the user selected a document from the results of the query. To overcome the large size and sparseness of the click-through data, the search system when presented with an input triplet comprising a user, a query, and a document determines a probability that the user will find the input document important by smoothing the click-through triplets. The search system then orders documents of the result based on the probability of their importance to the input user.
摘要:
A method and system for determining similarity between items is provided. To calculate similarity scores for pairs of items, the similarity system initializes a similarity score for each pair of objects and each pair of features. The similarity system then iteratively calculates the similarity scores for each pair of objects based on the similar scores of the pairs of features calculated during a previous iteration and calculates the similarity scores for each pair of features based on the similarity scores of the pairs of objects calculated during a previous iteration. The similarity system implements an algorithm that is based on a recursive definition of the similarities between objects and between features. The similarity system continues the iterations of recalculating the similarity scores until the similarity scores converge on a solution.
摘要:
A method and system for generating a projection matrix for projecting data from a high dimensional space to a low dimensional space. The system establishes an objective function based on a maximum margin criterion matrix. The system then provides data samples that are in the high dimensional space and have a class. For each data sample, the system incrementally derives leading eigenvectors of the maximum margin criterion matrix based on the derivation of the leading eigenvectors of the last data sample. The derived eigenvectors compose the projection matrix, which can be used to project data samples in a high dimensional space into a low dimensional space.
摘要:
Systems and methods for clustering-based text classification are described. In one aspect text is clustered as a function of labeled data to generate cluster(s). The text includes the labeled data and unlabeled data. Expanded labeled data is then generated as a function of the cluster(s). The expanded label data includes the labeled data and at least a portion of unlabeled data. Discriminative classifier(s) are then trained based on the expanded labeled data and remaining ones of the unlabeled data.
摘要:
Seed keywords are leveraged to provide expanded keywords that are then associated with relevant advertisers. Instances can also include locating potential advertisers based on the expanded keywords. Inverse lookup techniques are employed to determine which keywords are associated with an advertiser. Filtering can then be employed to eliminate inappropriate keywords for that advertiser. The keywords are then automatically revealed to the advertiser for consideration as relevant search terms for their advertisements. In this manner, revenue for a search engine and/or for an advertiser can be substantially enhanced through the automatic expansion of relevant search terms. Advertisers also benefit by having larger and more relevant search term selections automatically available to them, saving them both time and money.
摘要:
One aspect relates to clustering a group of objects of a first object type based on a relative importance using links extending between objects of the first object type and certain objects of the second object type. In one embodiment, the first object type is a Web page object type and the second type is a user object type.
摘要:
A system for augmenting click-through data with latent information present in the click-through data for use in generating search results that are better tailored to the information needs of a user submitting a query is provided. The augmentation system creates a three-dimensional matrix with the dimensions of users, queries, and documents. The augmentation system then performs a three-order singular value decomposition of the three-dimensional matrix to generate a three-dimensional core singular value matrix and a left singular matrix for each dimension. The augmentation system finally multiplies the three-dimensional core singular value matrix by the left singular matrices to generate an augmented three-dimensional matrix that explicitly contains the information that was latent in the un-augmented three-dimensional matrix.