发明授权
US08713021B2 Unsupervised document clustering using latent semantic density analysis 有权
使用潜在语义密度分析的无监督文档聚类

  • 专利标题: Unsupervised document clustering using latent semantic density analysis
  • 专利标题(中): 使用潜在语义密度分析的无监督文档聚类
  • 申请号: US12831909
    申请日: 2010-07-07
  • 公开(公告)号: US08713021B2
    公开(公告)日: 2014-04-29
  • 发明人: Jerome R. Bellegarda
  • 申请人: Jerome R. Bellegarda
  • 申请人地址: US CA Cupertino
  • 专利权人: Apple Inc.
  • 当前专利权人: Apple Inc.
  • 当前专利权人地址: US CA Cupertino
  • 代理机构: Morrison & Foerster LLP
  • 主分类号: G06F17/30
  • IPC分类号: G06F17/30
Unsupervised document clustering using latent semantic density analysis
摘要:
According to one embodiment, a latent semantic mapping (LSM) space is generated from a collection of a plurality of documents, where the LSM space includes a plurality of document vectors, each representing one of the documents in the collection. For each of the document vectors considered as a centroid document vector, a group of document vectors is identified in the LSM space that are within a predetermined hypersphere diameter from the centroid document vector. As a result, multiple groups of document vectors are formed. The predetermined hypersphere diameter represents a predetermined closeness measure among the document vectors in the LSM space. Thereafter, a group from the plurality of groups is designated as a cluster of document vectors, where the designated group contains a maximum number of document vectors among the plurality of groups.
信息查询
0/0