摘要:
One embodiment of the present invention provides a system that builds an association tensor (such as a matrix) to facilitate document and word-level processing operations. During operation, the system uses terms from a collection of documents to build an association tensor, which contains values representing pair-wise similarities between terms in the collection of documents. During this process, if a given value in the association tensor is calculated based on an insufficient number of samples, the system determines a corresponding value from a reference document collection, and then substitutes the corresponding value for the given value in the association tensor. After the association tensor is obtained, a dimensionality reduction method is applied to compute a low-dimensional vector space representation for the vocabulary terms. Document vectors are computed as linear combinations of term vectors.
摘要:
The present invention is directed to methods for improving flour quality (e.g., a flour correction process) by treating flour with a raw starch degrading enzyme.
摘要:
Various methods formulated using a geometric interpretation for identifying bilingual pairs in comparable corpora using a bilingual dictionary are disclosed. The methods may be used separately or in combination to compute the similarity between bilingual pairs.
摘要:
The present invention is directed to methods for improving flour quality (e.g., a flour correction process) by treating flour with a raw starch degrading enzyme.
摘要:
One embodiment of the present invention provides a system that builds an association tensor (such as a matrix) to facilitate document and word-level processing operations. During operation, the system uses terms from a collection of documents to build an association tensor, which contains values representing pair-wise similarities between terms in the collection of documents. During this process, if a given value in the association tensor is calculated based on an insufficient number of samples, the system determines a corresponding value from a reference document collection, and then substitutes the corresponding value for the given value in the association tensor. After the association tensor is obtained, a dimensionality reduction method is applied to compute a low-dimensional vector space representation for the vocabulary terms. Document vectors are computed as linear combinations of term vectors.
摘要:
A unique system and method that facilitates improving the ranking of items is provided. The system and method involve re-ranking decreasing subsets of high ranked items in separate stages. In particular, a basic ranking component can rank a set of items. A subset of the top or high ranking items can be taken and used as a new training set to train a component for improving the ranking among these high ranked documents. This process can be repeated on an arbitrary number of successive high ranked subsets. Thus, high ranked items can be reordered in separate stages by focusing on the higher ranked items to facilitate placing the most relevant items at the top of a search results list.
摘要:
A unique system and method that facilitates improving the ranking of items is provided. The system and method involve re-ranking decreasing subsets of high ranked items in separate stages. In particular, a basic ranking component can rank a set of items. A subset of the top or high ranking items can be taken and used as a new training set to train a component for improving the ranking among these high ranked documents. This process can be repeated on an arbitrary number of successive high ranked subsets. Thus, high ranked items can be reordered in separate stages by focusing on the higher ranked items to facilitate placing the most relevant items at the top of a search results list.
摘要:
Various methods formulated using a geometric interpretation for identifying bilingual pairs in comparable corpora using a bilingual dictionary are disclosed. The methods may be used separately or in combination to compute the similarity between bilingual pairs.