Data analytics system and methods for text data

    公开(公告)号:US11010548B2

    公开(公告)日:2021-05-18

    申请号:US16839192

    申请日:2020-04-03

    摘要: Aspects of the subject disclosure may include, for example, a process that performs a statistical, natural-language processing analysis on a group of text documents to determine a group of topics. The topics are determined according to parameters obtained by training on a sample of documents. One or more topics in a subset of topics are associated to each document, resulting in topic-document pairs. A bias is identified for each topic-document pair, and clusters of topics are created from the subset of topics. Each cluster of topics is determined from a value for each bias of each topic-document pair and from a frequency of occurrence of each topic. Each cluster is presentable according to a corresponding image configuration based on all or a subset of the bias dimensions and the frequency of occurrence of topics in a cluster that distinguishes the cluster from other clusters. Other embodiments are disclosed.

    DATA ANALYTICS SYSTEM AND METHODS FOR TEXT DATA

    公开(公告)号:US20210232764A1

    公开(公告)日:2021-07-29

    申请号:US17232546

    申请日:2021-04-16

    摘要: Aspects of the subject disclosure may include, for example, a process that performs a statistical, natural-language processing analysis on a group of text documents to determine a group of topics. The topics are determined according to parameters obtained by training on a sample of documents. One or more topics in a subset of topics are associated to each document, resulting in topic-document pairs. A bias is identified for each topic-document pair, and clusters of topics are created from the subset of topics. Each cluster of topics is determined from a value for each bias of each topic-document pair and from a frequency of occurrence of each topic. Each cluster is presentable according to a corresponding image configuration based on all or a subset of the bias dimensions and the frequency of occurrence of topics in a cluster that distinguishes the cluster from other clusters. Other embodiments are disclosed.

    DATA ANALYTICS SYSTEM AND METHODS FOR TEXT DATA

    公开(公告)号:US20200234005A1

    公开(公告)日:2020-07-23

    申请号:US16839192

    申请日:2020-04-03

    摘要: Aspects of the subject disclosure may include, for example, a process that performs a statistical, natural-language processing analysis on a group of text documents to determine a group of topics. The topics are determined according to parameters obtained by training on a sample of documents. One or more topics in a subset of topics are associated to each document, resulting in topic-document pairs. A bias is identified for each topic-document pair, and clusters of topics are created from the subset of topics. Each cluster of topics is determined from a value for each bias of each topic-document pair and from a frequency of occurrence of each topic. Each cluster is presentable according to a corresponding image configuration based on all or a subset of the bias dimensions and the frequency of occurrence of topics in a cluster that distinguishes the cluster from other clusters. Other embodiments are disclosed.

    DATA ANALYTICS SYSTEM AND METHODS FOR TEXT DATA

    公开(公告)号:US20190272320A1

    公开(公告)日:2019-09-05

    申请号:US16299871

    申请日:2019-03-12

    IPC分类号: G06F17/27 G06F16/28 G06F16/35

    摘要: Aspects of the subject disclosure may include, for example, a process that performs a statistical, natural-language processing analysis on a group of text documents to determine a group of topics. The topics are determined according to parameters obtained by training on a sample of documents. One or more topics in a subset of topics are associated to each document, resulting in topic-document pairs. A bias is identified for each topic-document pair, and clusters of topics are created from the subset of topics. Each cluster of topics is determined from a value for each bias of each topic-document pair and from a frequency of occurrence of each topic. Each cluster is presentable according to a corresponding image configuration based on all or a subset of the bias dimensions and the frequency of occurrence of topics in a cluster that distinguishes the cluster from other clusters. Other embodiments are disclosed.