Clustering of text units using dimensionality reduction of multi-dimensional arrays
    1.
    发明授权
    Clustering of text units using dimensionality reduction of multi-dimensional arrays 有权
    使用多维阵列的维数降低的文本单元的聚类

    公开(公告)号:US09141882B1

    公开(公告)日:2015-09-22

    申请号:US13656315

    申请日:2012-10-19

    摘要: Methods, systems, and apparatuses, including computer programs encoded on computer-readable media, for tokenizing n-grams from a plurality of text units. A multi-dimensional array is created having a plurality of dimensions based upon the plurality of text units and the n-grams from the plurality of text units. The multi-dimensional array is normalized and the dimensionality of the multi-dimensional array is reduced. The reduced dimensionality multi-dimensional array is clustered to generate a plurality of clusters that each cluster includes one or more of the plurality of text units.

    摘要翻译: 方法,系统和装置,包括在计算机可读介质上编码的计算机程序,用于从多个文本单元标记n-gram。 基于多个文本单元和来自多个文本单元的n-gram,创建具有多个维度的多维阵列。 多维阵列被归一化,并且多维阵列的维数降低。 群集的缩小的维度多维阵列被生成多个群集,每个群集包括多个文本单元中的一个或多个。