Clustering internet messages
    1.
    发明授权
    Clustering internet messages 有权
    聚集互联网讯息

    公开(公告)号:US08386487B1

    公开(公告)日:2013-02-26

    申请号:US12940917

    申请日:2010-11-05

    IPC分类号: G06F17/30

    CPC分类号: G06F17/30705 G06F17/30997

    摘要: Among other disclosed subject matter, a computer-method includes receiving a plurality of documents at a server and adding meta-data to each of the plurality of documents. The meta-data added to a particular document comprises at least one of task flow features of the particular document or data associated with an author of the particular document. The method also includes selecting a plurality of features for use in clustering the plurality of documents. The plurality of features includes a subset of the meta-data and a subset of content associated with one or more of the plurality of documents. The method also includes clustering the plurality of documents based on the plurality of features including identifying a topic associated with each cluster, and preparing a report based on the clusters and metric information associated with each cluster. The method also includes displaying the report to a user.

    摘要翻译: 在其他公开的主题中,计算机方法包括在服务器处接收多个文档,并将多个文档中的元数据添加到多个文档中。 添加到特定文档的元数据包括与特定文档的作者相关联的特定文档或数据的任务流特征中的至少一个。 该方法还包括选择用于聚类多个文档的多个特征。 多个特征包括元数据的子集和与多个文档中的一个或多个相关联的内容的子集。 该方法还包括基于多个特征聚集多个文档,包括识别与每个聚类相关联的主题,以及基于与每个聚类相关联的聚类和度量信息准备报告。 该方法还包括向用户显示报告。

    Method and system for remediating topic drift in near-real-time classification of customer feedback
    2.
    发明授权
    Method and system for remediating topic drift in near-real-time classification of customer feedback 有权
    用于补救客户反馈近实时分类中主题漂移的方法和系统

    公开(公告)号:US09111218B1

    公开(公告)日:2015-08-18

    申请号:US13530667

    申请日:2012-06-22

    摘要: A method and system of classifying documents is provided. The method includes receiving a stream of documents from at least one user wherein each document includes a topic of information relating to a customer support issue or sentiment. The method includes classifying each of the received documents using a plurality of trained classifiers, the classification based on a voting by the trained classifiers, each document labeled according to a similar topic. A drift of the topic of one or more of the classifications is determined wherein the drift is related to the received documents that include information relating to an unclassified customer support issue or sentiment. If the determined drift exceeds a predetermined threshold range, rebuilding the plurality of classifiers to include a second set of classifiers trained to recognize the unclassified customer support issue or sentiment.

    摘要翻译: 提供了一种分类文件的方法和系统。 该方法包括从至少一个用户接收文档流,其中每个文档包括与客户支持问题或情绪相关的信息的主题。 该方法包括使用多个经过训练的分类器对接收到的文档进行分类,基于训练分类器的投票的分类,每个文献根据相似的主题标记。 确定一个或多个分类的主题的漂移,其中漂移与包括与未分类的客户支持问题或情绪相关的信息的接收到的文档相关。 如果所确定的漂移超过预定的阈值范围,则重建多个分类器以包括被训练以识别未分类的客户支持问题或情绪的第二组分类器。

    Methods and systems for constructing a taxonomy based on hierarchical clustering
    3.
    发明授权
    Methods and systems for constructing a taxonomy based on hierarchical clustering 有权
    基于层次聚类构建分类法的方法和系统

    公开(公告)号:US09110984B1

    公开(公告)日:2015-08-18

    申请号:US13531147

    申请日:2012-06-22

    IPC分类号: G06F17/30

    摘要: Methods and systems for constructing a taxonomy based on hierarchical clustering are provided. The taxonomy is generated by first constructing a hierarchy of clusters using a clustering algorithm. A first level of the hierarchy of clusters is generated by providing a plurality of content files to a clustering algorithm. Subsequent levels of the hierarchy are generated by providing the clusters of the preceding levels to the clustering algorithm. Labels that characterize each cluster within the hierarchy are assigned to corresponding clusters. Labels and clusters are combined to form the taxonomy.

    摘要翻译: 提供了基于分层聚类构建分类法的方法和系统。 通过使用聚类算法首先构建聚类的层次来生成分类法。 通过向聚类算法提供多个内容文件来生成集群层级的第一级。 通过向聚类算法提供前述层次的聚类来生成层次结构的后续层级。 表征层次结构中每个集群的标签被分配给相应的集群。 标签和集群组合形成分类。

    Contextual text interpretation
    4.
    发明授权
    Contextual text interpretation 有权
    语境文本解读

    公开(公告)号:US08620918B1

    公开(公告)日:2013-12-31

    申请号:US13364177

    申请日:2012-02-01

    IPC分类号: G06F7/00 G06F17/30

    CPC分类号: G06F17/30707

    摘要: Among other disclosed subject matter, a computer-implemented method includes receiving a plurality of electronic documents associated with a domain at a server. Each of the plurality of electronic documents includes meta-data and textual content. The method includes identifying one or more text strings in the textual content that are to be processed differently than an identical or similar text string in other electronic documents, and associating, with the electronic document, data indicating that each of the identified text strings is to be processed differently than an identical or similar text string in other electronic documents. The method also includes performing an analysis of the electronic documents to identify one or more subsets of the electronic documents that include related subject matter. A plurality of degrees of relatedness can be associated with text strings associated with data indicating that each of the text strings is to be processed differently.

    摘要翻译: 在其他公开的主题中,计算机实现的方法包括在服务器处接收与域相关联的多个电子文档。 多个电子文档中的每一个包括元数据和文本内容。 所述方法包括识别所述文本内容中的待处理的文本字符串与其他电子文档中的相同或相似的文本字符串不同的一个或多个文本串,并且与所述电子文档相关联地指示每个所标识的文本串是 与其他电子文档中的相同或相似的文本字符串的处理方式不同。 该方法还包括执行电子文档的分析以识别包括相关主题的电子文档的一个或多个子集。 多个相关程度可以与与指示每个文本串被不同地处理的数据相关联的文本串相关联。

    Methods and systems for classifying data using a hierarchical taxonomy
    5.
    发明授权
    Methods and systems for classifying data using a hierarchical taxonomy 有权
    使用分层分类法对数据进行分类的方法和系统

    公开(公告)号:US09367814B1

    公开(公告)日:2016-06-14

    申请号:US13530505

    申请日:2012-06-22

    IPC分类号: G06N99/00

    摘要: A method and system for classifying documents is provided. A set of document classifiers is generated by applying a classification algorithm to a trusted corpus that includes a set of training documents representing a taxonomy. One or more of the generated document classifiers are executed against a plurality of input documents to create a plurality of classified documents. Each classified document is associated with a classification within the taxonomy and a classification confidence level. One or more classified documents that are associated with a classification confidence level below a predetermined threshold value are selected to create a set of low-confidence documents. The low-confidence documents are disassociated from each of the associated classifications. A user is prompted to enter a classification within the taxonomy for at least one low-confidence document. The low-confidence document is associated with the entered classification and with a predetermined confidence level to create a newly classified document.

    摘要翻译: 提供了一种分类文件的方法和系统。 通过将分类算法应用于包含表示分类法的一组训练文档的受信任语料库来生成一组文档分类器。 针对多个输入文档执行生成的文档分类器中的一个或多个以创建多个分类文档。 每个分类文件与分类法和分类置信水平中的分类相关联。 选择与低于预定阈值的分类置信水平相关联的一个或多个分类文档以创建一组低置信度文档。 低信度文件与每个相关分类分离。 提示用户至少输入一个低信度文档,在分类法中输入分类。 低信度文档与输入的分类相关联,并具有预定的置信水平以创建新分类的文档。

    Clustering internet resources
    6.
    发明授权
    Clustering internet resources 有权
    集群互联网资源

    公开(公告)号:US08423551B1

    公开(公告)日:2013-04-16

    申请号:US12940905

    申请日:2010-11-05

    IPC分类号: G06F17/30

    CPC分类号: G06F17/30867

    摘要: Among other disclosed subject matter, a computer-implemented method includes receiving one or more keywords and identifying a plurality of content items. The content items comprise network content that includes the one or more keywords. The method also includes clustering the plurality of content items and identifying a topic associated with each cluster. The method also includes determining a relative importance of a particular topic and analyzing clusters associated with the particular topic to determine opinion data associated with the particular topic. The method includes preparing a report based on the clusters, relative importance and the opinion data and display the report to a user.

    摘要翻译: 在其他公开的主题中,计算机实现的方法包括接收一个或多个关键字并识别多个内容项。 内容项目包括包括一个或多个关键字的网络内容。 该方法还包括对多个内容项目进行聚类并识别与每个群集相关联的主题。 该方法还包括确定特定主题的相对重要性并分析与特定主题相关联的群集以确定与该特定主题相关联的意见数据。 该方法包括基于集群,相对重要性和意见数据准备报告,并向用户显示报告。

    Methods and systems for partitioning documents having customer feedback and support content
    7.
    发明授权
    Methods and systems for partitioning documents having customer feedback and support content 有权
    用于分割具有客户反馈和支持内容的文档的方法和系统

    公开(公告)号:US09436758B1

    公开(公告)日:2016-09-06

    申请号:US13530619

    申请日:2012-06-22

    IPC分类号: G06F17/30

    CPC分类号: G06F17/30654 G06F17/30705

    摘要: Methods and systems for use in partitioning documents having customer feedback and support content are provided. One exemplary computer-implemented method including executing instructions stored on a computer-readable medium includes receiving a plurality of documents, at least a portion of the plurality of documents including customer feedback related to an issue and support content responsive to the customer feedback, filtering the plurality of documents to retain one of the customer feedback and the support content within a plurality of filtered documents, partitioning the plurality of filtered documents into multiple clusters, receiving a new document, and partitioning the new document based on at least one keyword included in one of the multiple clusters of filtered documents.

    摘要翻译: 提供了用于分割具有客户反馈和支持内容的文档的方法和系统。 一种示例性的计算机实现的方法包括执行存储在计算机可读介质上的指令,包括接收多个文档,所述多个文档的至少一部分包括响应于客户反馈的与问题相关的客户反馈和支持内容, 多个文档,以保留多个经过过滤的文档中的一个客户反馈和支持内容,将多个被过滤的文档划分成多个集群,接收新的文档,以及基于包括在一个中的至少一个关键字来划分新的文档 的多个过滤文档集群。

    Methods and systems for organizing content
    8.
    发明授权
    Methods and systems for organizing content 有权
    组织内容的方法和系统

    公开(公告)号:US08972404B1

    公开(公告)日:2015-03-03

    申请号:US13531081

    申请日:2012-06-22

    IPC分类号: G06F17/30

    CPC分类号: G06F17/30705 G06F17/3071

    摘要: A computer-implemented method executes instructions stored on a computer-readable medium. The method includes accessing a hierarchy of clusters, wherein each cluster includes at least one content file, and a label is associated with each cluster. The method further includes calculating a topic purity score for each cluster, and selecting a first cluster and a second cluster from the hierarchy of clusters, wherein the topic purity score of the first cluster and the second cluster are less than a purity threshold. The method also includes creating a third cluster by combining the content files included within the first cluster and the second cluster, determining a parent category of the first cluster and the second cluster, wherein the parent category is at a level within the hierarchy higher than a level of the first cluster and the second cluster, and associating a label of the parent category with the third cluster.

    摘要翻译: 计算机实现的方法执行存储在计算机可读介质上的指令。 该方法包括访问集群的层次结构,其中每个集群包括至少一个内容文件,并且标签与每个集群相关联。 该方法还包括计算每个群集的主题纯度分数,以及从群集层级中选择第一群集和第二群集,其中第一群集和第二群集的主题纯度得分小于纯度阈值。 该方法还包括通过组合包括在第一集群和第二集群内的内容文件来创建第三集群,确定第一集群和第二集群的父类别,其中,父类别处于层级以内的级别,高于 级别的第一个群集和第二个群集,并将父类别的标签与第三个群集相关联。

    Cross-channel clusters of information
    9.
    发明授权
    Cross-channel clusters of information 有权
    跨渠道信息群集

    公开(公告)号:US08543577B1

    公开(公告)日:2013-09-24

    申请号:US13038835

    申请日:2011-03-02

    IPC分类号: G06F7/00 G06F17/30

    CPC分类号: G06F17/3071

    摘要: A computer-implemented method includes receiving, by one or more computer systems, first information from a first channel and second information from a second channel; merging the first information with the second information; applying an unsupervised clustering model to the merged information; and generating, based on results of the applying, a cross-channel cluster, the cross-channel cluster including (i) a portion of the first information associated with a subject matter, and (ii) a portion of the second information associated with the subject matter.

    摘要翻译: 计算机实现的方法包括由一个或多个计算机系统接收来自第一信道的第一信息和来自第二信道的第二信息; 将第一信息与第二信息合并; 对合并的信息应用无监督的聚类模型; 以及基于所述应用的结果生成跨信道群集,所述跨信道群集包括(i)与主题相关联的所述第一信息的一部分,以及(ii)与所述第一信息相关联的所述第一信息的一部分, 主题。

    Classification of clustered documents based on similarity scores
    10.
    发明授权
    Classification of clustered documents based on similarity scores 有权
    基于相似度分数的聚类文档分类

    公开(公告)号:US08543576B1

    公开(公告)日:2013-09-24

    申请号:US13479188

    申请日:2012-05-23

    IPC分类号: G06F17/30

    CPC分类号: G06F17/30707

    摘要: Among other disclosed subject matter, a computer-implemented method that includes receiving a set of clusters of documents and calculating a similarity score for each cluster wherein the similarity score is based at least in part on features included in the documents in the cluster and indicates a measure of similarity of the documents in the cluster. For each cluster associated with a respective similarity score greater than a first threshold, identifying the cluster as satisfying a quality assurance requirement. For each cluster associated with a respective similarity score less than a second threshold, identifying the cluster as failing the quality assurance requirement. For each cluster associated with a similarity score less than or equal to the first threshold value and greater than or equal to the second threshold value, reviewing at least a subset of documents in the cluster to determine whether the cluster satisfies the quality assurance requirement.

    摘要翻译: 在其他公开的主题之中,一种计算机实现的方法,其包括接收一组文档簇并计算每个群集的相似性得分,其中所述相似性得分至少部分地基于所述群集中的文档中包括的特征,并且指示 测量集群中文档的相似度。 对于与大于第一阈值的相应相似性得分相关联的每个聚类,识别聚类以满足质量保证要求。 对于与相应的相似性得分相关联的小于第二阈值的每个群集,将群集识别为质量保证要求不合格。 对于与相似性得分小于或等于第一阈值并且大于或等于第二阈值相关联的每个聚类,检查集群中的文档的至少一个子集以确定该集群是否满足质量保证要求。