Abstract:
Documents of a set of documents are represented by bag-of-words (BOW) vectors. L labeled topics are provided, each labeled with a word list comprising words of a vocabulary that are representative of the labeled topic and possibly a list of relevant documents. Probabilistic classification of the documents generates for each labeled topic a document vector whose elements store scores of the documents for the labeled topic and a word vector whose elements store scores of the words of the vocabulary for the labeled topic. Non-negative matrix factorization (NMF) is performed to generate a document-topic model that clusters the documents into k topics where k>L. NMF factors representing L topics of the k topics are initialized to the document and word vectors for the L labeled topics. In some embodiments the NMF factors representing the L topics initialized to the document and word vectors are frozen, that is, are not updated by the NMF after the initialization.
Abstract:
In multi-view learning, optimized prediction matrices are determined for V≧2 views of n objects, and a prediction of a view of an object is generated based on the optimized prediction matrix for that view. An objective is optimized, wherein is a set of parameters including at least the V prediction matrices and a concatenated matrix comprising a concatenation of the prediction matrices, and comprises a sum including at least a loss function for each view, a trace norm of the prediction matrix for each view, and a trace norm of the concatenated matrix. may further include a sparse matrix for each view, with further including an element-wise norm of the sparse matrix for each view. may further include regularization parameters scaling the trace norms of the prediction matrices and the trace norm of the concatenated matrix.
Abstract:
A method operates on observed relationship data between pairs of entities of a set of entities including entities of at least two (and optionally at least three) different entity types. An observed collective symmetric matrix is constructed in which element (n,m)=element (m,n) stores the observed relationship between entities indexed n and m when the observed relationship data includes this observed relationship. A prediction collective symmetric matrix is optimized in order to minimize a loss function comparing the observed collective symmetric matrix and the prediction collective symmetric matrix. A relationship between two entities of the set of entities is predicted using the optimized prediction collective symmetric matrix. Entities of the same entity type may be indexed using a contiguous set of indices such that the entity type maps to a contiguous set of rows and corresponding contiguous set of columns in the observed collective symmetric matrix.
Abstract:
In multi-view learning, optimized prediction matrices are determined for V≧2 views of n objects, and a prediction of a view of an object is generated based on the optimized prediction matrix for that view. An objective is optimized, wherein is a set of parameters including at least the V prediction matrices and a concatenated matrix comprising a concatenation of the prediction matrices, and comprises a sum including at least a loss function for each view, a trace norm of the prediction matrix for each view, and a trace norm of the concatenated matrix. may further include a sparse matrix for each view, with further including an element-wise norm of the sparse matrix for each view. may further include regularization parameters scaling the trace norms of the prediction matrices and the trace norm of the concatenated matrix.
Abstract:
A computer-implemented system and method compute a reference behavior for a user, such as a new user of a set of shared devices or services. The method includes acquiring usage data for an initial set of users of the devices and extracting features from the usage data. A model is learned with the extracted features for predicting a user role profile for a new user based on features extracted from the new user's usage data. The user role profile associates the user with at least one of a set of roles. A new user's usage data is received and, with the trained model, a user role profile is predicting for the new user based on features extracted from the new user's usage data. A reference behavior is computed for the user based on the predicted user role profile and the reference behaviors for roles in the set of roles.
Abstract:
Documents of a set of documents are represented by bag-of-words (BOW) vectors. L labeled topics are provided, each labeled with a word list comprising words of a vocabulary that are representative of the labeled topic and possibly a list of relevant documents. Probabilistic classification of the documents generates for each labeled topic a document vector whose elements store scores of the documents for the labeled topic and a word vector whose elements store scores of the words of the vocabulary for the labeled topic. Non-negative matrix factorization (NMF) is performed to generate a document-topic model that clusters the documents into k topics where k>L. NMF factors representing L topics of the k topics are initialized to the document and word vectors for the L labeled topics. In some embodiments the NMF factors representing the L topics initialized to the document and word vectors are frozen, that is, are not updated by the NMF after the initialization.
Abstract:
A collective matrix is constructed, having a diagnostic sessions dimension and a diagnostic state descriptors dimension. The diagnostic state descriptors dimension includes a plurality of symptom fields, a plurality of root cause fields, and a plurality of solution fields. Collective matrix factorization of the collective matrix is performed to generate a factored collective matrix comprising a sessions factor matrix embedding diagnostic sessions and a descriptors factor matrix embedding diagnostic state descriptors. An in-progress diagnostic session is embedded in the factored collective matrix. A symptom or solution is recommended for evaluation in the in-progress diagnostic session based on the embedding. The diagnostic state descriptors dimension may further include at least one information field storing a representation (for example, a bag-of-words representation) of a semantic description of a problem being diagnosed by the in-progress diagnostic session.
Abstract:
In a parallel computing method performed by a parallel computing system comprising a plurality of central processing units (CPUs), a main process executes. Tasks are executed in parallel with the main process on CPUs not used in executing the main process. Results of completed tasks are stored in a cache, from which the main process retrieves completed task results when needed. The initiation of task execution is controlled by a priority ranking of tasks based on at least probabilities that task results will be needed by the main process and time limits for executing the tasks. The priority ranking of tasks is from the vantage point of a current execution point in the main process and is updated as the main process executes. An executing task may be pre-empted by a task having higher priority if no idle CPU is available.
Abstract:
In a parallel computing method performed by a parallel computing system comprising a plurality of central processing units (CPUs), a main process executes. Tasks are executed in parallel with the main process on CPUs not used in executing the main process. Results of completed tasks are stored in a cache, from which the main process retrieves completed task results when needed. The initiation of task execution is controlled by a priority ranking of tasks based on at least probabilities that task results will be needed by the main process and time limits for executing the tasks. The priority ranking of tasks is from the vantage point of a current execution point in the main process and is updated as the main process executes. An executing task may be pre-empted by a task having higher priority if no idle CPU is available.
Abstract:
A multi-relational data set is represented by a probabilistic multi-relational data model in which each entity of the multi-relational data set is represented by a D-dimensional latent feature vector. The probabilistic multi-relational data model is trained using a collection of observations of relations between entities of the multi-relational data set. The collection of observations includes observations of at least two different relation types. A prediction is generated for an observation of a relation between two or more entities of the multi-relational data set based on a dot product of the optimized D-dimensional latent feature vectors representing the two or more entities. The training may comprise optimizing the D-dimensional latent feature vectors to maximize likelihood of the collection of observations, for example by Bayesian inference performed using Gibbs sampling.