System and method for domain adaption with partial observation
    1.
    发明授权
    System and method for domain adaption with partial observation 有权
    用局部观察进行域适应的系统和方法

    公开(公告)号:US08856050B2

    公开(公告)日:2014-10-07

    申请号:US13006245

    申请日:2011-01-13

    IPC分类号: G06F15/18

    CPC分类号: G06N99/005 G06F17/3071

    摘要: A novel domain adaption/transfer learning method applied to the problem of classifying abbreviated documents, e.g., short text messages, instant messages, tweets. The method uses a large number of multi-labeled examples (source domain) to improve the learning on the partial observations (target domain). Specifically, a hidden, higher-level abstraction space is learned that is meaningful for the multi-labeled examples in the source domain. This is done by simultaneously minimizing the document reconstruction error and the error in a classification model learned in the hidden space using known labels from the source domain. The partial observations in the target space are then mapped to the same hidden space, and classified into the label space determined by the source domain.

    摘要翻译: 一种适用于对简短文件进行分类的问题的新颖的领域适应/转移学习方法,例如短文本消息,即时消息,推文。 该方法使用大量多标记示例(源域)来改善部分观察(目标域)的学习。 具体来说,学习一个隐藏的,更高级别的抽象空间,这对于源域中的多标签示例是有意义的。 这是通过使用来自源域的已知标签在隐藏空间中学习的分类模型中同时最小化文档重建错误和错误来完成的。 然后将目标空间中的部分观察值映射到相同的隐藏空间,并将其分类为由源域确定的标签空间。

    GRAPH-BASED FRAMEWORK FOR MULTI-TASK MULTI-VIEW LEARNING
    2.
    发明申请
    GRAPH-BASED FRAMEWORK FOR MULTI-TASK MULTI-VIEW LEARNING 有权
    用于多任务多视图学习的基于图形的框架

    公开(公告)号:US20130325756A1

    公开(公告)日:2013-12-05

    申请号:US13488885

    申请日:2012-06-05

    IPC分类号: G06F15/18

    CPC分类号: G06K9/628

    摘要: A system and method a Multi-Task Multi-View (M2TV) learning problem. The method uses the label information from related tasks to make up for the lack of labeled data in a single task. The method further uses the consistency among different views to improve the performance. It is tailored for the above complicated dual heterogeneous problems where multiple related tasks have both shared and task-specific views (features), since it makes full use of the available information.

    摘要翻译: 多任务多视图(M2TV)学习问题的系统和方法。 该方法使用相关任务的标签信息来弥补单个任务中缺少标记数据。 该方法进一步使用不同视图之间的一致性来提高性能。 它针对上述复杂的双重异构问题,其中多个相关任务具有共享和任务特定的视图(特征),因为它充分利用了可用的信息。

    INFERRING EMERGING AND EVOLVING TOPICS IN STREAMING TEXT
    3.
    发明申请
    INFERRING EMERGING AND EVOLVING TOPICS IN STREAMING TEXT 审中-公开
    在流动文字中传播新兴和演变主题

    公开(公告)号:US20130151525A1

    公开(公告)日:2013-06-13

    申请号:US13616403

    申请日:2012-09-14

    IPC分类号: G06F17/30

    CPC分类号: G06F17/2785 G06F16/316

    摘要: A method, system and computer program product for inferring topic evolution and emergence in a set of documents. In one embodiment, the method comprises forming a group of matrices using text in the documents, and analyzing these matrices to identify evolving topics and emerging topics. The matrices includes a matrix X identifying a multitude of words in each of the documents, a matrix W identifying a multitude of topics in each of the documents, and a matrix H identifying a multitude of words for each of the multitude of topics. These matrices are analyzed to identify the evolving and emerging topics. In an embodiment, two forms of temporal regularizers are used to help identify the evolving and emerging topics. In another embodiment, a two stage approach involving detection and clustering is used to help identify the evolving and emerging topics.

    摘要翻译: 一套用于推断主题演变和出现在一组文件中的方法,系统和计算机程序产品。 在一个实施例中,该方法包括使用文档中的文本形成一组矩阵,并且分析这些矩阵以识别演进主题和新兴主题。 矩阵包括识别每个文档中的多个单词的矩阵X,标识每个文档中的众多主题的矩阵W以及为每个主题识别多个单词的矩阵H。 对这些矩阵进行分析,以确定不断发展的新兴主题。 在一个实施例中,使用两种形式的时间正则化器来帮助识别不断发展和新兴的主题。 在另一个实施例中,使用涉及检测和聚类的两阶段方法来帮助识别不断发展和新兴的主题。

    GRAPH-BASED TRANSFER LEARNING
    4.
    发明申请
    GRAPH-BASED TRANSFER LEARNING 审中-公开
    基于图形的传输学习

    公开(公告)号:US20130013540A1

    公开(公告)日:2013-01-10

    申请号:US13619142

    申请日:2012-09-14

    IPC分类号: G06F15/18

    CPC分类号: G06N99/005

    摘要: Transfer learning is the task of leveraging the information from labeled examples in some domains to predict the labels for examples in another domain. It finds abundant practical applications, such as sentiment prediction, image classification and network intrusion detection. A graph-based transfer learning framework propagates label information from a source domain to a target domain via the example-feature-example tripartite graph, and puts more emphasis on the labeled examples from the target domain via the example-example bipartite graph. An iterative algorithm renders the framework scalable to large-scale applications. The framework propagates the label information to both features irrelevant to the source domain and unlabeled examples in the target domain via common features in a principled way.

    摘要翻译: 转移学习是利用来自某些领域的标记示例的信息来预测另一个域中的示例的标签的任务。 发现情绪预测,图像分类和网络入侵检测等丰富的实际应用。 基于图形的传输学习框架通过示例特征示例三方图将标签信息从源域传播到目标域,并通过示例性的二分图更加强调来自目标域的标记示例。 迭代算法使框架可扩展到大规模应用程序。 该框架通过原理方式的共同特征将标签信息传播到与源域无关的特征和目标域中的未标记示例。

    SYSTEM AND METHOD FOR DOMAIN ADAPTION WITH PARTIAL OBSERVATION
    5.
    发明申请
    SYSTEM AND METHOD FOR DOMAIN ADAPTION WITH PARTIAL OBSERVATION 有权
    用于局部观察的域适应的系统和方法

    公开(公告)号:US20120185415A1

    公开(公告)日:2012-07-19

    申请号:US13006245

    申请日:2011-01-13

    IPC分类号: G06F15/18

    CPC分类号: G06N99/005 G06F17/3071

    摘要: System, method and computer program product provides a novel domain adaption/transfer learning approach applied to the problem of classifying abbreviated documents, e.g., short text messages, instant messages, tweets. The proposed method uses a large number of multi-labeled examples (source domain) to improve the learning on the partial observations (target domain). Specifically, a hidden, higher-level abstraction space is learned that is meaningful for the multi-labeled examples in the source domain. This is done by simultaneously minimizing the document reconstruction error and the error in a classification model learned in the hidden space using known labels from the source domain. The partial observations in the target space are then mapped to the same hidden space, and classified into the label space determined by the source domain. Exemplary results provided for a Twitter dataset demonstrate that the method identifies meaningful hidden topics and provides useful classifications of specific tweets.

    摘要翻译: 系统,方法和计算机程序产品提供了一种新颖的域适应/转移学习方法,其应用于对简短文档进行分类的问题,例如短文本消息,即时消息,推文。 所提出的方法使用大量多标记示例(源域)来改善部分观察(目标域)上的学习。 具体来说,学习一个隐藏的,更高级别的抽象空间,这对于源域中的多标签示例是有意义的。 这是通过使用来自源域的已知标签在隐藏空间中学习的分类模型中同时最小化文档重建错误和错误来完成的。 然后将目标空间中的部分观察值映射到相同的隐藏空间,并将其分类为由源域确定的标签空间。 为Twitter数据集提供的示例性结果表明该方法识别有意义的隐藏主题,并提供特定推文的有用分类。

    Graph-based transfer learning
    6.
    发明申请
    Graph-based transfer learning 审中-公开
    基于图形的传输学习

    公开(公告)号:US20110320387A1

    公开(公告)日:2011-12-29

    申请号:US12938063

    申请日:2010-11-02

    IPC分类号: G06F15/18

    CPC分类号: G06N20/00

    摘要: Transfer learning is the task of leveraging the information from labeled examples in some domains to predict the labels for examples in another domain. It finds abundant practical applications, such as sentiment prediction, image classification and network intrusion detection. A graph-based transfer learning framework propagates label information from a source domain to a target domain via the example-feature-example tripartite graph, and puts more emphasis on the labeled examples from the target domain via the example-example bipartite graph. An iterative algorithm renders the framework scalable to large-scale applications. The framework propagates the label information to both features irrelevant to the source domain and unlabeled examples in the target domain via common features in a principled way.

    摘要翻译: 转移学习是利用来自某些领域的标记示例的信息来预测另一个域中的示例的标签的任务。 发现情绪预测,图像分类和网络入侵检测等丰富的实际应用。 基于图形的传输学习框架通过示例特征示例三方图将标签信息从源域传播到目标域,并通过示例性的二分图更加强调来自目标域的标记示例。 迭代算法使框架可扩展到大规模应用程序。 该框架通过原理方式的共同特征将标签信息传播到与源域无关的特征和目标域中的未标记示例。

    INFERRING EMERGING AND EVOLVING TOPICS IN STREAMING TEXT
    8.
    发明申请
    INFERRING EMERGING AND EVOLVING TOPICS IN STREAMING TEXT 有权
    在流动文字中传播新兴和演变主题

    公开(公告)号:US20130151520A1

    公开(公告)日:2013-06-13

    申请号:US13315798

    申请日:2011-12-09

    IPC分类号: G06F17/30

    CPC分类号: G06F17/2785 G06F17/30619

    摘要: A method, system and computer program product for inferring topic evolution and emergence in a set of documents. In one embodiment, the method comprises forming a group of matrices using text in the documents, and analyzing these matrices to identify a first group of topics as evolving topics and a second group of topics as emerging topics. The matrices includes a first matrix X identifying a multitude of words in each of the documents, a second matrix W identifying a multitude of topics in each of the documents, and a third matrix H identifying a multitude of words for each of the multitude of topics. These matrices are analyzed to identify the evolving and emerging topics. In an embodiment, the documents form a streaming dataset, and two forms of temporal regularizers are used to help identify the evolving topics and the emerging topics in the streaming dataset.

    摘要翻译: 一套用于推断主题演变和出现在一组文件中的方法,系统和计算机程序产品。 在一个实施例中,该方法包括使用文档中的文本形成一组矩阵,并且分析这些矩阵以将第一组主题识别为演变主题,将第二组主题识别为新兴主题。 矩阵包括识别每个文档中的多个单词的第一矩阵X,标识每个文档中的众多主题的第二矩阵W,以及为每个主题中的每一个标识多个单词的第三矩阵H 。 对这些矩阵进行分析,以确定不断发展的新兴主题。 在一个实施例中,文档形成流数据集,并且使用两种形式的时间规则化器来帮助识别流数据集中不断发展的主题和新兴主题。

    METHOD AND SYSTEM USING MACHINE LEARNING TO AUTOMATICALLY DISCOVER HOME PAGES ON THE INTERNET
    9.
    发明申请
    METHOD AND SYSTEM USING MACHINE LEARNING TO AUTOMATICALLY DISCOVER HOME PAGES ON THE INTERNET 有权
    使用机器学习的方法和系统在互联网上自动发现家庭页面

    公开(公告)号:US20090210419A1

    公开(公告)日:2009-08-20

    申请号:US12033160

    申请日:2008-02-19

    IPC分类号: G06F17/30

    CPC分类号: G06F17/30864

    摘要: A method for automatically determining an Internet home page corresponding to a named entity identified by a specified descriptor including building a trained machine-learning model, generating candidate matches from the specified descriptor, wherein each candidate match includes an Internet address, extracting content-based features from websites associated with the Internet addresses of the candidate matches, determining a model score for each candidate match based on the content-based features using the trained machine-learning model, and determining a match from among the candidate matches according to the scores, wherein the match is returned as the Internet home page corresponding to the named entity.

    摘要翻译: 一种用于自动确定与由指定描述符标识的命名实体相对应的因特网主页的方法,包括建立训练有素的机器学习模型,从指定的描述符生成候选匹配,其中每个候选匹配包括因特网地址,提取基于内容的特征 从与候选匹配的互联网地址相关联的网站,基于使用训练机器学习模型的基于内容的特征来确定每个候选匹配的模型分数,以及根据分数从候选匹配中确定匹配,其中 该匹配将作为与该命名实体相对应的因特网主页返回。

    Graph-based transfer learning
    10.
    发明授权
    Graph-based transfer learning 有权
    基于图形的传输学习

    公开(公告)号:US09477929B2

    公开(公告)日:2016-10-25

    申请号:US13619142

    申请日:2012-09-14

    IPC分类号: G06F5/00 G06N5/00 G06N99/00

    CPC分类号: G06N99/005

    摘要: Transfer learning is the task of leveraging the information from labeled examples in some domains to predict the labels for examples in another domain. It finds abundant practical applications, such as sentiment prediction, image classification and network intrusion detection. A graph-based transfer learning framework propagates label information from a source domain to a target domain via the example-feature-example tripartite graph, and puts more emphasis on the labeled examples from the target domain via the example-example bipartite graph. An iterative algorithm renders the framework scalable to large-scale applications. The framework propagates the label information to both features irrelevant to the source domain and unlabeled examples in the target domain via common features in a principled way.

    摘要翻译: 转移学习是利用来自某些领域的标记示例的信息来预测另一个域中的示例的标签的任务。 发现情绪预测,图像分类和网络入侵检测等丰富的实际应用。 基于图形的传输学习框架通过示例特征示例三方图将标签信息从源域传播到目标域,并通过示例性的二分图更加强调来自目标域的标记示例。 迭代算法使框架可扩展到大规模应用程序。 该框架通过原理方式的共同特征将标签信息传播到与源域无关的特征和目标域中的未标记示例。