Methods and systems for the analysis of large text corpora
    1.
    发明授权
    Methods and systems for the analysis of large text corpora 有权
    大文本语料库分析方法与系统

    公开(公告)号:US09135242B1

    公开(公告)日:2015-09-15

    申请号:US13832339

    申请日:2013-03-15

    IPC分类号: G06F17/27 G06F17/28

    CPC分类号: G06F17/2785

    摘要: Computerized methods and systems for the analysis of textual data, including: receiving, from one or more memories at one or more processors, textual data; using the processors, formatting the textual data for analysis and applying a probabilistic topic model to the textual data to extract semantically meaningful topics that collectively describe it; using a keyword weighting module, generating a topic cloud view representing the topics as a tagcloud with each being associated with a plurality of keywords; using a topic ordering module, generating a document distribution view representing a distribution of the textual data across multiple topics; using a document entropy calculation module, generating a document scatterplot view representing how many topics are attributable to the textual data; using a temporal topic trend calculation module, generating a temporal view representing changes in the occurrence of topics over time; and displaying one or more of the views to a user.

    摘要翻译: 用于分析文本数据的计算机化方法和系统,包括:从一个或多个处理器的一个或多个存储器接收文本数据; 使用处理器,格式化文本数据进行分析,并将概率主题模型应用于文本数据,以提取共同描述它的语义有意义的主题; 使用关键字加权模块,生成表示主题的主题云视图作为标签,每个主题与多个关键字相关联; 使用主题排序模块,生成表示跨多个主题的文本数据的分布的文档分发视图; 使用文档熵计算模块,生成表示可以归因于文本数据的主题的文档散点图; 使用时间主题趋势计算模块,生成表示随时间推移的主题发生变化的时间视图; 并将一个或多个视图显示给用户。