Scalable probabilistic latent semantic analysis

发明申请

US20070239431A1 Scalable probabilistic latent semantic analysis 有权

标题翻译：可扩展概率潜在语义分析

请登陆查看更多内容

专利标题： Scalable probabilistic latent semantic analysis
专利标题（中）： 可扩展概率潜在语义分析
申请号： US11392763

申请日： 2006-03-30
公开(公告)号： US20070239431A1

公开(公告)日： 2007-10-11
发明人: Chenxi Lin , Jie Han , Guirong Xue , Hua-Jun Zeng , Benyu Zhang , Zheng Chen , Jian Wang
申请人： Chenxi Lin , Jie Han , Guirong Xue , Hua-Jun Zeng , Benyu Zhang , Zheng Chen , Jian Wang
申请人地址： US WA Redmond
专利权人： Microsoft Corporation
当前专利权人： Microsoft Corporation
当前专利权人地址： US WA Redmond
主分类号： G06F17/27
IPC分类号： G06F17/27

摘要：

A scalable two-pass scalable probabilistic latent semantic analysis (PLSA) methodology is disclosed that may perform more efficiently, and in some cases more accurately, than traditional PLSA, especially where large and/or sparse data sets are provided for analysis. The improved methodology can greatly reduce the storage and/or computational costs of training a PLSA model. In the first pass of the two-pass methodology, objects are clustered into groups, and PLSA is performed on the groups instead of the original individual objects. In the second pass, the conditional probability of a latent class, given an object, is obtained. This may be done by extending the training results of the first pass. During the second pass, the most likely latent classes for each object are identified.

摘要（中）：

公开了一种可扩展的双向可伸缩概率潜在语义分析（PLSA）方法，其可以比传统的PLSA更有效地执行，在某些情况下可以更准确地执行，特别是在提供大数据集和/或稀疏数据集用于分析的情况下。改进的方法可以大大降低培训PLSA模型的存储和/或计算成本。在双路方法的第一遍中，对象被聚集成组，并且PLSA在组而不是原始的单个对象上执行。在第二遍中，获得给定对象的潜在类的条件概率。这可以通过扩展第一遍的训练结果来完成。在第二遍期间，识别每个对象最可能的潜在类。

公开/授权文献

US07844449B2 Scalable probabilistic latent semantic analysis 公开/授权日：2010-11-30

信息查询

Global Dossier

Espacenet