摘要:
A novel domain adaption/transfer learning method applied to the problem of classifying abbreviated documents, e.g., short text messages, instant messages, tweets. The method uses a large number of multi-labeled examples (source domain) to improve the learning on the partial observations (target domain). Specifically, a hidden, higher-level abstraction space is learned that is meaningful for the multi-labeled examples in the source domain. This is done by simultaneously minimizing the document reconstruction error and the error in a classification model learned in the hidden space using known labels from the source domain. The partial observations in the target space are then mapped to the same hidden space, and classified into the label space determined by the source domain.
摘要:
A system and method a Multi-Task Multi-View (M2TV) learning problem. The method uses the label information from related tasks to make up for the lack of labeled data in a single task. The method further uses the consistency among different views to improve the performance. It is tailored for the above complicated dual heterogeneous problems where multiple related tasks have both shared and task-specific views (features), since it makes full use of the available information.
摘要:
A method, system and computer program product for inferring topic evolution and emergence in a set of documents. In one embodiment, the method comprises forming a group of matrices using text in the documents, and analyzing these matrices to identify evolving topics and emerging topics. The matrices includes a matrix X identifying a multitude of words in each of the documents, a matrix W identifying a multitude of topics in each of the documents, and a matrix H identifying a multitude of words for each of the multitude of topics. These matrices are analyzed to identify the evolving and emerging topics. In an embodiment, two forms of temporal regularizers are used to help identify the evolving and emerging topics. In another embodiment, a two stage approach involving detection and clustering is used to help identify the evolving and emerging topics.
摘要:
Transfer learning is the task of leveraging the information from labeled examples in some domains to predict the labels for examples in another domain. It finds abundant practical applications, such as sentiment prediction, image classification and network intrusion detection. A graph-based transfer learning framework propagates label information from a source domain to a target domain via the example-feature-example tripartite graph, and puts more emphasis on the labeled examples from the target domain via the example-example bipartite graph. An iterative algorithm renders the framework scalable to large-scale applications. The framework propagates the label information to both features irrelevant to the source domain and unlabeled examples in the target domain via common features in a principled way.
摘要:
System, method and computer program product provides a novel domain adaption/transfer learning approach applied to the problem of classifying abbreviated documents, e.g., short text messages, instant messages, tweets. The proposed method uses a large number of multi-labeled examples (source domain) to improve the learning on the partial observations (target domain). Specifically, a hidden, higher-level abstraction space is learned that is meaningful for the multi-labeled examples in the source domain. This is done by simultaneously minimizing the document reconstruction error and the error in a classification model learned in the hidden space using known labels from the source domain. The partial observations in the target space are then mapped to the same hidden space, and classified into the label space determined by the source domain. Exemplary results provided for a Twitter dataset demonstrate that the method identifies meaningful hidden topics and provides useful classifications of specific tweets.
摘要:
Transfer learning is the task of leveraging the information from labeled examples in some domains to predict the labels for examples in another domain. It finds abundant practical applications, such as sentiment prediction, image classification and network intrusion detection. A graph-based transfer learning framework propagates label information from a source domain to a target domain via the example-feature-example tripartite graph, and puts more emphasis on the labeled examples from the target domain via the example-example bipartite graph. An iterative algorithm renders the framework scalable to large-scale applications. The framework propagates the label information to both features irrelevant to the source domain and unlabeled examples in the target domain via common features in a principled way.
摘要:
The present invention employs data processing systems to handle debt collection by formulation the collections process as a Markov Decision Process with constrained resources, thus making it possible automatically to generate an optimal collections policy with respect to maximizing long-term expected return throughout the course of a collections process, subject to constraints on the available resources possibly in multiple organizations. This is accomplished by coupling data modeling and resource optimization within the constrained Markov Decision Process formulation and generating optimized rules based on constrained reinforcement learning process comprising applied on the basis of past historical data.
摘要:
A method, system and computer program product for inferring topic evolution and emergence in a set of documents. In one embodiment, the method comprises forming a group of matrices using text in the documents, and analyzing these matrices to identify a first group of topics as evolving topics and a second group of topics as emerging topics. The matrices includes a first matrix X identifying a multitude of words in each of the documents, a second matrix W identifying a multitude of topics in each of the documents, and a third matrix H identifying a multitude of words for each of the multitude of topics. These matrices are analyzed to identify the evolving and emerging topics. In an embodiment, the documents form a streaming dataset, and two forms of temporal regularizers are used to help identify the evolving topics and the emerging topics in the streaming dataset.
摘要:
A method for automatically determining an Internet home page corresponding to a named entity identified by a specified descriptor including building a trained machine-learning model, generating candidate matches from the specified descriptor, wherein each candidate match includes an Internet address, extracting content-based features from websites associated with the Internet addresses of the candidate matches, determining a model score for each candidate match based on the content-based features using the trained machine-learning model, and determining a match from among the candidate matches according to the scores, wherein the match is returned as the Internet home page corresponding to the named entity.
摘要:
Transfer learning is the task of leveraging the information from labeled examples in some domains to predict the labels for examples in another domain. It finds abundant practical applications, such as sentiment prediction, image classification and network intrusion detection. A graph-based transfer learning framework propagates label information from a source domain to a target domain via the example-feature-example tripartite graph, and puts more emphasis on the labeled examples from the target domain via the example-example bipartite graph. An iterative algorithm renders the framework scalable to large-scale applications. The framework propagates the label information to both features irrelevant to the source domain and unlabeled examples in the target domain via common features in a principled way.