发明申请
US20060106767A1 System and method for identifying query-relevant keywords in documents with latent semantic analysis 有权
在具有潜在语义分析的文档中识别查询相关关键词的系统和方法

System and method for identifying query-relevant keywords in documents with latent semantic analysis
摘要:
A system and method for identifying query-related keywords in documents found in a search using latent semantic analysis. The documents are represented as a document term matrix M containing one or more document term-weight vectors d, which may be term-frequency (tf) vectors or term-frequency inverse-document-frequency (tf-idf) vectors. This matrix is subjected to a truncated singular value decomposition. The resulting transform matrix U can be used to project a query term-weight vector q into the reduced N-dimensional space, followed by its expansion back into the full vector space using the inverse of U. To perform a search, the similarity of qexpanded is measured relative to each candidate document vector in this space. Exemplary similarity functions are dot product and cosine similarity. Keywords are selected with the highest values in qexpanded that are also comprised in at least one document. Matching keywords from the query may be highlighted in the search results.
信息查询
0/0