-
公开(公告)号:US09135242B1
公开(公告)日:2015-09-15
申请号:US13832339
申请日:2013-03-15
申请人: Xiaoyu Wang , Wenwen Dou , William Ribarsky
发明人: Xiaoyu Wang , Wenwen Dou , William Ribarsky
CPC分类号: G06F17/2785
摘要: Computerized methods and systems for the analysis of textual data, including: receiving, from one or more memories at one or more processors, textual data; using the processors, formatting the textual data for analysis and applying a probabilistic topic model to the textual data to extract semantically meaningful topics that collectively describe it; using a keyword weighting module, generating a topic cloud view representing the topics as a tagcloud with each being associated with a plurality of keywords; using a topic ordering module, generating a document distribution view representing a distribution of the textual data across multiple topics; using a document entropy calculation module, generating a document scatterplot view representing how many topics are attributable to the textual data; using a temporal topic trend calculation module, generating a temporal view representing changes in the occurrence of topics over time; and displaying one or more of the views to a user.
摘要翻译: 用于分析文本数据的计算机化方法和系统,包括:从一个或多个处理器的一个或多个存储器接收文本数据; 使用处理器,格式化文本数据进行分析,并将概率主题模型应用于文本数据,以提取共同描述它的语义有意义的主题; 使用关键字加权模块,生成表示主题的主题云视图作为标签,每个主题与多个关键字相关联; 使用主题排序模块,生成表示跨多个主题的文本数据的分布的文档分发视图; 使用文档熵计算模块,生成表示可以归因于文本数据的主题的文档散点图; 使用时间主题趋势计算模块,生成表示随时间推移的主题发生变化的时间视图; 并将一个或多个视图显示给用户。