INFERRING EMERGING AND EVOLVING TOPICS IN STREAMING TEXT
    3.
    发明申请
    INFERRING EMERGING AND EVOLVING TOPICS IN STREAMING TEXT 有权
    在流动文字中传播新兴和演变主题

    公开(公告)号:US20130151520A1

    公开(公告)日:2013-06-13

    申请号:US13315798

    申请日:2011-12-09

    IPC分类号: G06F17/30

    CPC分类号: G06F17/2785 G06F17/30619

    摘要: A method, system and computer program product for inferring topic evolution and emergence in a set of documents. In one embodiment, the method comprises forming a group of matrices using text in the documents, and analyzing these matrices to identify a first group of topics as evolving topics and a second group of topics as emerging topics. The matrices includes a first matrix X identifying a multitude of words in each of the documents, a second matrix W identifying a multitude of topics in each of the documents, and a third matrix H identifying a multitude of words for each of the multitude of topics. These matrices are analyzed to identify the evolving and emerging topics. In an embodiment, the documents form a streaming dataset, and two forms of temporal regularizers are used to help identify the evolving topics and the emerging topics in the streaming dataset.

    摘要翻译: 一套用于推断主题演变和出现在一组文件中的方法,系统和计算机程序产品。 在一个实施例中,该方法包括使用文档中的文本形成一组矩阵,并且分析这些矩阵以将第一组主题识别为演变主题,将第二组主题识别为新兴主题。 矩阵包括识别每个文档中的多个单词的第一矩阵X,标识每个文档中的众多主题的第二矩阵W,以及为每个主题中的每一个标识多个单词的第三矩阵H 。 对这些矩阵进行分析,以确定不断发展的新兴主题。 在一个实施例中,文档形成流数据集,并且使用两种形式的时间规则化器来帮助识别流数据集中不断发展的主题和新兴主题。

    Inferring emerging and evolving topics in streaming text
    4.
    发明授权
    Inferring emerging and evolving topics in streaming text 有权
    推动流媒体文本中新兴和不断发展的话题

    公开(公告)号:US08909643B2

    公开(公告)日:2014-12-09

    申请号:US13315798

    申请日:2011-12-09

    IPC分类号: G06F17/30

    CPC分类号: G06F17/2785 G06F17/30619

    摘要: A method, system and computer program product for inferring topic evolution and emergence in a set of documents. In one embodiment, the method comprises forming a group of matrices using text in the documents, and analyzing these matrices to identify a first group of topics as evolving topics and a second group of topics as emerging topics. The matrices includes a first matrix X identifying a multitude of words in each of the documents, a second matrix W identifying a multitude of topics in each of the documents, and a third matrix H identifying a multitude of words for each of the multitude of topics. These matrices are analyzed to identify the evolving and emerging topics. In an embodiment, the documents form a streaming dataset, and two forms of temporal regularizers are used to help identify the evolving topics and the emerging topics in the streaming dataset.

    摘要翻译: 一套用于推断主题演变和出现在一组文件中的方法,系统和计算机程序产品。 在一个实施例中,该方法包括使用文档中的文本形成一组矩阵,并且分析这些矩阵以将第一组主题识别为演变主题,将第二组主题识别为新兴主题。 矩阵包括识别每个文档中的多个单词的第一矩阵X,标识每个文档中的众多主题的第二矩阵W,以及为每个主题中的每一个标识多个单词的第三矩阵H 。 对这些矩阵进行分析,以确定不断发展的新兴主题。 在一个实施例中,文档形成流数据集,并且使用两种形式的时间规则化器来帮助识别流数据集中不断发展的主题和新兴主题。

    INFERRING EMERGING AND EVOLVING TOPICS IN STREAMING TEXT
    5.
    发明申请
    INFERRING EMERGING AND EVOLVING TOPICS IN STREAMING TEXT 审中-公开
    在流动文字中传播新兴和演变主题

    公开(公告)号:US20130151525A1

    公开(公告)日:2013-06-13

    申请号:US13616403

    申请日:2012-09-14

    IPC分类号: G06F17/30

    CPC分类号: G06F17/2785 G06F16/316

    摘要: A method, system and computer program product for inferring topic evolution and emergence in a set of documents. In one embodiment, the method comprises forming a group of matrices using text in the documents, and analyzing these matrices to identify evolving topics and emerging topics. The matrices includes a matrix X identifying a multitude of words in each of the documents, a matrix W identifying a multitude of topics in each of the documents, and a matrix H identifying a multitude of words for each of the multitude of topics. These matrices are analyzed to identify the evolving and emerging topics. In an embodiment, two forms of temporal regularizers are used to help identify the evolving and emerging topics. In another embodiment, a two stage approach involving detection and clustering is used to help identify the evolving and emerging topics.

    摘要翻译: 一套用于推断主题演变和出现在一组文件中的方法,系统和计算机程序产品。 在一个实施例中,该方法包括使用文档中的文本形成一组矩阵,并且分析这些矩阵以识别演进主题和新兴主题。 矩阵包括识别每个文档中的多个单词的矩阵X,标识每个文档中的众多主题的矩阵W以及为每个主题识别多个单词的矩阵H。 对这些矩阵进行分析,以确定不断发展的新兴主题。 在一个实施例中,使用两种形式的时间正则化器来帮助识别不断发展和新兴的主题。 在另一个实施例中,使用涉及检测和聚类的两阶段方法来帮助识别不断发展和新兴的主题。

    System and method for domain adaption with partial observation
    6.
    发明授权
    System and method for domain adaption with partial observation 有权
    用局部观察进行域适应的系统和方法

    公开(公告)号:US08856050B2

    公开(公告)日:2014-10-07

    申请号:US13006245

    申请日:2011-01-13

    IPC分类号: G06F15/18

    CPC分类号: G06N99/005 G06F17/3071

    摘要: A novel domain adaption/transfer learning method applied to the problem of classifying abbreviated documents, e.g., short text messages, instant messages, tweets. The method uses a large number of multi-labeled examples (source domain) to improve the learning on the partial observations (target domain). Specifically, a hidden, higher-level abstraction space is learned that is meaningful for the multi-labeled examples in the source domain. This is done by simultaneously minimizing the document reconstruction error and the error in a classification model learned in the hidden space using known labels from the source domain. The partial observations in the target space are then mapped to the same hidden space, and classified into the label space determined by the source domain.

    摘要翻译: 一种适用于对简短文件进行分类的问题的新颖的领域适应/转移学习方法,例如短文本消息,即时消息,推文。 该方法使用大量多标记示例(源域)来改善部分观察(目标域)的学习。 具体来说,学习一个隐藏的,更高级别的抽象空间,这对于源域中的多标签示例是有意义的。 这是通过使用来自源域的已知标签在隐藏空间中学习的分类模型中同时最小化文档重建错误和错误来完成的。 然后将目标空间中的部分观察值映射到相同的隐藏空间,并将其分类为由源域确定的标签空间。

    SYSTEM AND METHOD FOR DOMAIN ADAPTION WITH PARTIAL OBSERVATION
    7.
    发明申请
    SYSTEM AND METHOD FOR DOMAIN ADAPTION WITH PARTIAL OBSERVATION 有权
    用于局部观察的域适应的系统和方法

    公开(公告)号:US20120185415A1

    公开(公告)日:2012-07-19

    申请号:US13006245

    申请日:2011-01-13

    IPC分类号: G06F15/18

    CPC分类号: G06N99/005 G06F17/3071

    摘要: System, method and computer program product provides a novel domain adaption/transfer learning approach applied to the problem of classifying abbreviated documents, e.g., short text messages, instant messages, tweets. The proposed method uses a large number of multi-labeled examples (source domain) to improve the learning on the partial observations (target domain). Specifically, a hidden, higher-level abstraction space is learned that is meaningful for the multi-labeled examples in the source domain. This is done by simultaneously minimizing the document reconstruction error and the error in a classification model learned in the hidden space using known labels from the source domain. The partial observations in the target space are then mapped to the same hidden space, and classified into the label space determined by the source domain. Exemplary results provided for a Twitter dataset demonstrate that the method identifies meaningful hidden topics and provides useful classifications of specific tweets.

    摘要翻译: 系统,方法和计算机程序产品提供了一种新颖的域适应/转移学习方法,其应用于对简短文档进行分类的问题,例如短文本消息,即时消息,推文。 所提出的方法使用大量多标记示例(源域)来改善部分观察(目标域)上的学习。 具体来说,学习一个隐藏的,更高级别的抽象空间,这对于源域中的多标签示例是有意义的。 这是通过使用来自源域的已知标签在隐藏空间中学习的分类模型中同时最小化文档重建错误和错误来完成的。 然后将目标空间中的部分观察值映射到相同的隐藏空间,并将其分类为由源域确定的标签空间。 为Twitter数据集提供的示例性结果表明该方法识别有意义的隐藏主题,并提供特定推文的有用分类。

    System and method for domain adaption with partial observation

    公开(公告)号:US08856052B2

    公开(公告)日:2014-10-07

    申请号:US13618603

    申请日:2012-09-14

    IPC分类号: G06F15/18

    CPC分类号: G06N99/005 G06F17/3071

    摘要: A novel domain adaption/transfer learning method applied to the problem of classifying abbreviated documents, e.g., short text messages, instant messages, tweets. The method uses a large number of multi-labeled examples (source domain) to improve the learning on the partial observations (target domain). Specifically, a hidden, higher-level abstraction space is learned that is meaningful for the multi-labeled examples in the source domain. This is done by simultaneously minimizing the document reconstruction error and the error in a classification model learned in the hidden space using known labels from the source domain. The partial observations in the target space are then mapped to the same hidden space, and classified into the label space determined by the source domain.

    SYSTEM AND METHOD FOR DOMAIN ADAPTION WITH PARTIAL OBSERVATION
    9.
    发明申请
    SYSTEM AND METHOD FOR DOMAIN ADAPTION WITH PARTIAL OBSERVATION 有权
    用于局部观察的域适应的系统和方法

    公开(公告)号:US20130013539A1

    公开(公告)日:2013-01-10

    申请号:US13618603

    申请日:2012-09-14

    IPC分类号: G06F15/18

    CPC分类号: G06N99/005 G06F17/3071

    摘要: System, method and computer program product provides a novel domain adaption/transfer learning approach applied to the problem of classifying abbreviated documents, e.g., short text messages, instant messages, tweets. The proposed method uses a large number of multi-labeled examples (source domain) to improve the learning on the partial observations (target domain). Specifically, a hidden, higher-level abstraction space is learned that is meaningful for the multi-labeled examples in the source domain. This is done by simultaneously minimizing the document reconstruction error and the error in a classification model learned in the hidden space using known labels from the source domain. The partial observations in the target space are then mapped to the same hidden space, and classified into the label space determined by the source domain. Exemplary results provided for a Twitter dataset demonstrate that the method identifies meaningful hidden topics and provides useful classifications of specific tweets.

    摘要翻译: 系统,方法和计算机程序产品提供了一种新颖的域适应/转移学习方法,其应用于对简短文档进行分类的问题,例如短文本消息,即时消息,推文。 所提出的方法使用大量多标记示例(源域)来改善部分观察(目标域)上的学习。 具体来说,学习一个隐藏的,更高级别的抽象空间,这对于源域中的多标签示例是有意义的。 这是通过使用来自源域的已知标签在隐藏空间中学习的分类模型中同时最小化文档重建错误和错误来完成的。 然后将目标空间中的部分观察值映射到相同的隐藏空间,并将其分类为由源域确定的标签空间。 为Twitter数据集提供的示例性结果表明该方法识别有意义的隐藏主题,并提供特定推文的有用分类。

    METHOD AND SYSTEM USING MACHINE LEARNING TO AUTOMATICALLY DISCOVER HOME PAGES ON THE INTERNET
    10.
    发明申请
    METHOD AND SYSTEM USING MACHINE LEARNING TO AUTOMATICALLY DISCOVER HOME PAGES ON THE INTERNET 有权
    使用机器学习的方法和系统在互联网上自动发现家庭页面

    公开(公告)号:US20090210419A1

    公开(公告)日:2009-08-20

    申请号:US12033160

    申请日:2008-02-19

    IPC分类号: G06F17/30

    CPC分类号: G06F17/30864

    摘要: A method for automatically determining an Internet home page corresponding to a named entity identified by a specified descriptor including building a trained machine-learning model, generating candidate matches from the specified descriptor, wherein each candidate match includes an Internet address, extracting content-based features from websites associated with the Internet addresses of the candidate matches, determining a model score for each candidate match based on the content-based features using the trained machine-learning model, and determining a match from among the candidate matches according to the scores, wherein the match is returned as the Internet home page corresponding to the named entity.

    摘要翻译: 一种用于自动确定与由指定描述符标识的命名实体相对应的因特网主页的方法,包括建立训练有素的机器学习模型,从指定的描述符生成候选匹配,其中每个候选匹配包括因特网地址,提取基于内容的特征 从与候选匹配的互联网地址相关联的网站,基于使用训练机器学习模型的基于内容的特征来确定每个候选匹配的模型分数,以及根据分数从候选匹配中确定匹配,其中 该匹配将作为与该命名实体相对应的因特网主页返回。