Generating training documents
    1.
    发明授权
    Generating training documents 有权
    生成培训文件

    公开(公告)号:US09002102B2

    公开(公告)日:2015-04-07

    申请号:US13725487

    申请日:2012-12-21

    CPC classification number: G06K9/6256 G06F17/30598 G06N99/005

    Abstract: A method of generating training documents for training a classifying device comprises, with a processor, determining a number of sub-samples in a number of original documents, and creating a number of pseudo-documents from the sub-samples, the pseudo-documents comprising a portion of the number of sub-samples. A device for training a classifying device comprises a processor, and a memory communicatively coupled to the processor. The memory comprises a sampling module to, when executed by the processor, determine a number of sub-samples in a number of original documents, a pseudo-document creation module to, when executed by the processor, create a number of pseudo-documents from the sub-samples, the pseudo-documents comprising a portion of the number of sub-samples, and a training module to, when executed by the processor, train a classifying device to classify textual documents based on the pseudo-documents.

    Abstract translation: 一种生成用于训练分类装置的训练文档的方法包括:利用处理器确定多个原始文档中的多个子样本,以及从子样本创建多个伪文档,所述伪文档包括 子样本数量的一部分。 用于训练分类设备的设备包括处理器和通信地耦合到处理器的存储器。 所述存储器包括采样模块,当由所述处理器执行时,所述采样模块确定多个原始文档中的多个子样本,伪文档创建模块在由所述处理器执行时从所述原始文档创建多个伪文档, 子样本,包含子样本数量的一部分的伪文档,以及训练模块,当由处理器执行时,训练模块基于伪文档来训练分类设备对文本文档进行分类。

    Generating a feature set
    2.
    发明授权

    公开(公告)号:US10331799B2

    公开(公告)日:2019-06-25

    申请号:US14780707

    申请日:2013-03-28

    Abstract: A technique to generate a feature set. A plurality of samples from a data set can be clustered. Features can be selected based on the clusters. The features can be added to the feature set. Additional samples can be clustered and features selected and added to the feature set until a convergence threshold is reached.

    GENERATING A FEATURE SET
    3.
    发明申请
    GENERATING A FEATURE SET 审中-公开
    产生一个特征集

    公开(公告)号:US20160085811A1

    公开(公告)日:2016-03-24

    申请号:US14780707

    申请日:2013-03-28

    CPC classification number: G06F16/2457 G06F16/2465 G06F16/285

    Abstract: A technique to generate a feature set. A plurality of samples from a data set can be clustered. Features can be selected based on the clusters. The features can be added to the feature set. Additional samples can be clustered and features selected and added to the feature set until a convergence threshold is reached.

    Abstract translation: 一种生成特征集的技术。 来自数据集的多个样本可以聚类。 可以根据集群选择功能。 功能可以添加到功能集中。 可以对附加样本进行聚类,并将特征选择并添加到特征集中,直到达到收敛阈值。

    Generating training documents
    4.
    发明授权
    Generating training documents 有权
    生成培训文件

    公开(公告)号:US09165258B2

    公开(公告)日:2015-10-20

    申请号:US13709773

    申请日:2012-12-10

    CPC classification number: G06N99/005 G06K9/00442 G06K9/00463 G06K9/4676

    Abstract: A method of generating training documents for training a classifying device comprises, with a processor, sampling from a distribution of words in a number of original documents, and creating a number of pseudo-documents from the distribution of words, the pseudo-documents comprising a similar distribution of words as the original documents. A device for classifying textual documents comprises a processor; and a memory communicatively coupled to the processor, the memory comprising a sampling module to, when executed by the processor, determine the distribution of words in a number of original documents, a pseudo-document creation module to, when executed by the processor, create a number of pseudo-documents from the distribution of words, the pseudo-documents comprising a similar distribution of words as the original documents, and a training module to, when executed by the processor, train the device to classify textual documents based on the pseudo-documents.

    Abstract translation: 一种生成用于训练分类装置的训练文档的方法包括:利用处理器对来自多个原始文档中的单词分布进行采样,以及从分发单词创建多个伪文档,所述伪文档包括 与原始文件相似的分词。 用于分类文本文档的设备包括处理器; 以及通信地耦合到所述处理器的存储器,所述存储器包括采样模块,以在由所述处理器执行时确定多个原始文档中的字的分布;伪文档创建模块,当被所述处理器执行时,创建 来自分发单词的多个伪文档,包含与原始文档相似的单词分类的伪文档,以及训练模块,当由处理器执行时,训练模块基于该伪文档对文本文档进行分类 文件。

    COMPUTING A MOMENT FOR CATEGORIZING A DOCUMENT
    5.
    发明申请
    COMPUTING A MOMENT FOR CATEGORIZING A DOCUMENT 审中-公开
    计算用于分类文档的图像

    公开(公告)号:US20140379713A1

    公开(公告)日:2014-12-25

    申请号:US13923500

    申请日:2013-06-21

    CPC classification number: G06F16/353

    Abstract: For documents in a collection, respective data structures containing information representing occurrence of terms in the corresponding documents are generated. For a first one of the documents, at least one moment is computed based on the information in the data structure corresponding to the first document, where the at least one moment represents at least one characteristic of a distribution of values derived from the information in the data structure corresponding to the first document. The at least one moment is useable to categorize the first document into one of a plurality of classes of documents.

    Abstract translation: 对于集合中的文档,生成包含表示相应文档中的术语出现的信息的相应数据结构。 对于第一个文档,至少一个时刻是基于与第一个文档相对应的数据结构中的信息计算的,其中至少一个时刻表示从在第一个文档中的信息导出的值的分布的至少一个特征 数据结构对应于第一个文档。 至少一个时刻可用于将第一个文档分类成多个类别的文档之一。

    DETERMINING TOPIC RELEVANCE OF AN EMAIL THREAD
    6.
    发明申请
    DETERMINING TOPIC RELEVANCE OF AN EMAIL THREAD 审中-公开
    确定电子邮件主题的主题

    公开(公告)号:US20160080303A1

    公开(公告)日:2016-03-17

    申请号:US14786350

    申请日:2013-07-30

    CPC classification number: H04L51/16 G06Q10/107

    Abstract: A method for determining topic relevance of an email thread with an electronic device is described. The method includes removing redundancy from email messages in an email thread, grouping a number of email threads into a number of email clusters, identifying high information gain terms for each email cluster, identifying topic terms for each email cluster from the high information gain terms and determining a relevance of the number of email threads in an email cluster based on the topic terms for the email cluster and a threshold number of email messages in an email thread.

    Abstract translation: 描述了一种用于确定电子邮件线程与电子设备的主题相关性的方法。 该方法包括从电子邮件线程中的电子邮件消除冗余,将多个电子邮件线程分组成多个电子邮件集群,识别每个电子邮件集群的高信息增益术语,从高信息增益项中识别每个电子邮件集群的主题项, 基于电子邮件集群的主题条款和电子邮件线程中的电子邮件消息的阈值,确定电子邮件集群中的电子邮件线程的数量的相关性。

    Generating Training Documents
    7.
    发明申请
    Generating Training Documents 有权
    生成培训文件

    公开(公告)号:US20140177948A1

    公开(公告)日:2014-06-26

    申请号:US13725487

    申请日:2012-12-21

    CPC classification number: G06K9/6256 G06F17/30598 G06N99/005

    Abstract: A method of generating training documents for training a classifying device comprises, with a processor, determining a number of sub-samples in a number of original documents, and creating a number of pseudo-documents from the sub-samples, the pseudo-documents comprising a portion of the number of sub-samples. A device for training a classifying device comprises a processor, and a memory communicatively coupled to the processor. The memory comprises a sampling module to, when executed by the processor, determine a number of sub-samples in a number of original documents, a pseudo-document creation module to, when executed by the processor, create a number of pseudo-documents from the sub-samples, the pseudo-documents comprising a portion of the number of sub-samples, and a training module to, when executed by the processor, train a classifying device to classify textual documents based on the pseudo-documents.

    Abstract translation: 一种生成用于训练分类装置的训练文档的方法包括:利用处理器确定多个原始文档中的多个子样本,以及从子样本创建多个伪文档,所述伪文档包括 子样本数量的一部分。 用于训练分类设备的设备包括处理器和通信地耦合到处理器的存储器。 所述存储器包括采样模块,当由所述处理器执行时,所述采样模块确定多个原始文档中的多个子样本,伪文档创建模块在由所述处理器执行时从所述原始文档创建多个伪文档, 子样本,包含子样本数量的一部分的伪文档,以及训练模块,当由处理器执行时,训练模块基于伪文档来训练分类设备对文本文档进行分类。

Patent Agency Ranking