发明授权
US09141882B1 Clustering of text units using dimensionality reduction of multi-dimensional arrays 有权
使用多维阵列的维数降低的文本单元的聚类

Clustering of text units using dimensionality reduction of multi-dimensional arrays
摘要:
Methods, systems, and apparatuses, including computer programs encoded on computer-readable media, for tokenizing n-grams from a plurality of text units. A multi-dimensional array is created having a plurality of dimensions based upon the plurality of text units and the n-grams from the plurality of text units. The multi-dimensional array is normalized and the dimensionality of the multi-dimensional array is reduced. The reduced dimensionality multi-dimensional array is clustered to generate a plurality of clusters that each cluster includes one or more of the plurality of text units.
信息查询
0/0