- 专利标题: Data-parallel parameter estimation of the Latent Dirichlet allocation model by greedy Gibbs sampling
-
申请号: US14599272申请日: 2015-01-16
-
公开(公告)号: US10860829B2公开(公告)日: 2020-12-08
- 发明人: Jean-Baptiste Tristan , Guy Steele
- 申请人: Oracle International Corporation
- 申请人地址: US CA Redwood Shores
- 专利权人: Oracle International Corporation
- 当前专利权人: Oracle International Corporation
- 当前专利权人地址: US CA Redwood Shores
- 代理机构: Hickman Palermo Becker Bingham LLP
- 主分类号: G06K9/00
- IPC分类号: G06K9/00 ; G06F16/93
摘要:
A novel data-parallel algorithm is presented for topic modeling on a highly-parallel hardware architectures. The algorithm is a Markov-Chain Monte Carlo algorithm used to estimate the parameters of the LDA topic model. This algorithm is based on a highly parallel partially-collapsed Gibbs sampler, but replaces a stochastic step that draws from a distribution with an optimization step that computes the mean of the distribution directly and deterministically. This algorithm is correct, it is statistically performant, and it is faster than state-of-the art algorithms because it can exploit the massive amounts of parallelism by processing the algorithm on a highly-parallel architecture, such as a GPU. Furthermore, the partially-collapsed Gibbs sampler converges about as fast as the collapsed Gibbs sampler and identifies solutions that are as good, or even better, as the collapsed Gibbs sampler.
公开/授权文献
信息查询