发明授权
- 专利标题: Clustering of text units using dimensionality reduction of multi-dimensional arrays
- 专利标题(中): 使用多维阵列的维数降低的文本单元的聚类
-
申请号: US13656315申请日: 2012-10-19
-
公开(公告)号: US09141882B1公开(公告)日: 2015-09-22
- 发明人: Baoqiang Cao , T. Ryan Fitz-Gibbon , Lucas Forehand , Ryan McHale , Bradley Burke
- 申请人: Networked Insights, LLC
- 申请人地址: US WI Madison
- 专利权人: NETWORKED INSIGHTS, LLC
- 当前专利权人: NETWORKED INSIGHTS, LLC
- 当前专利权人地址: US WI Madison
- 代理机构: Foley & Lardner LLP
- 主分类号: G06E1/00
- IPC分类号: G06E1/00 ; G06E3/00 ; G06F15/18 ; G06G7/00 ; G06K9/62 ; G06F17/30
摘要:
Methods, systems, and apparatuses, including computer programs encoded on computer-readable media, for tokenizing n-grams from a plurality of text units. A multi-dimensional array is created having a plurality of dimensions based upon the plurality of text units and the n-grams from the plurality of text units. The multi-dimensional array is normalized and the dimensionality of the multi-dimensional array is reduced. The reduced dimensionality multi-dimensional array is clustered to generate a plurality of clusters that each cluster includes one or more of the plurality of text units.
信息查询