发明申请
- 专利标题: Scalable probabilistic latent semantic analysis
- 专利标题(中): 可扩展概率潜在语义分析
-
申请号: US11392763申请日: 2006-03-30
-
公开(公告)号: US20070239431A1公开(公告)日: 2007-10-11
- 发明人: Chenxi Lin , Jie Han , Guirong Xue , Hua-Jun Zeng , Benyu Zhang , Zheng Chen , Jian Wang
- 申请人: Chenxi Lin , Jie Han , Guirong Xue , Hua-Jun Zeng , Benyu Zhang , Zheng Chen , Jian Wang
- 申请人地址: US WA Redmond
- 专利权人: Microsoft Corporation
- 当前专利权人: Microsoft Corporation
- 当前专利权人地址: US WA Redmond
- 主分类号: G06F17/27
- IPC分类号: G06F17/27
摘要:
A scalable two-pass scalable probabilistic latent semantic analysis (PLSA) methodology is disclosed that may perform more efficiently, and in some cases more accurately, than traditional PLSA, especially where large and/or sparse data sets are provided for analysis. The improved methodology can greatly reduce the storage and/or computational costs of training a PLSA model. In the first pass of the two-pass methodology, objects are clustered into groups, and PLSA is performed on the groups instead of the original individual objects. In the second pass, the conditional probability of a latent class, given an object, is obtained. This may be done by extending the training results of the first pass. During the second pass, the most likely latent classes for each object are identified.
公开/授权文献
- US07844449B2 Scalable probabilistic latent semantic analysis 公开/授权日:2010-11-30
信息查询