Abstract:
Systems and methods for processing video are provided. The method includes receiving a text-based description of active scenes and representing the text-based description as a word embedding matrix. The method includes using a text encoder implemented by neural network to output frame level textual representation and video level representation of the word embedding matrix. The method also includes generating, by a shared generator, frame by frame video based on the frame level textual representation, the video level representation and noise vectors. A frame level and a video level convolutional filter of a video discriminator are generated to classify frames and video of the frame by frame video as true or false. The method also includes training a conditional video generator that includes the text encoder, the video discriminator, and the shared generator in a generative adversarial network to convergence.
Abstract:
Systems and methods are disclosed for Natural Language Processing (NLP) by applying metric labeling to sentence matching problem by preprocessing a dataset of sentences into objects graphs and label graphs; given an object graph and a label graph, assigning nodes of the object graph to the nodes of the label graph by minimizing an objective function including an assignment cost and a separation cost; and applying the metric labeling to matching two sentences where the objective function value is used as a similarity score between sentences for classification, clustering, or ranking.
Abstract:
Systems and methods are disclosed to answer free form questions using recursive neural network (RNN) by defining feature representations at every node of a parse trees of questions and supporting sentences, when applied recursively, starting with token vectors from a neural probabilistic language model; and extracting answers to arbitrary natural language questions from supporting sentences.
Abstract:
A method is provided for visual inspection. The method includes learning, by a processor, group disentangled visual feature embedding vectors of input images. The input images include defective objects and defect-free objects. The method further includes generating, by the processor using a weight generation network, classification weights from visual features and semantic descriptions. Both the visual features and the semantic descriptions are for predicting defective and defect-free labels. The method also includes calculating, by the processor, a cosine similarity score between the classification weights and the group disentangled visual feature embedding vectors. The method additionally includes episodically training, by the processor, the weight generation network on the input images to update parameters of the weight generation network. The method further includes generating, by the processor using the trained weight generation network, a prediction of a test image as including any of defective objects and defect-free objects.
Abstract:
A method is provided for visual inspection. The method includes learning, by a processor, group disentangled visual feature embedding vectors of input images. The input images include defective objects and defect-free objects. The method further includes generating, by the processor using a weight generation network, classification weights from visual features and semantic descriptions. Both the visual features and the semantic descriptions are for predicting defective and defect-free labels. The method also includes calculating, by the processor, a cosine similarity score between the classification weights and the group disentangled visual feature embedding vectors. The method additionally includes episodically training, by the processor, the weight generation network on the input images to update parameters of the weight generation network. The method further includes generating, by the processor using the trained weight generation network, a prediction of a test image as including any of defective objects and defect-free objects.
Abstract:
Systems and methods for predicting new relationships in the knowledge graph, including embedding a partial triplet including a head entity description and a relationship or a tail entity description to produce a separate vector for each of the head, relationship, and tail. The vectors for the head entity, relationship, and tail entity can be combined into a first matrix, and adaptive kernels generated from the entity descriptions can be applied to the matrix through convolutions to produce a second matrix having a different dimension from the first matrix. An activation function can be applied to the second matrix to obtain non-negative feature maps, and max-pooling can be used over the feature maps to get subsamples. A fixed length vector, Z, flattens the subsampling feature maps into a feature vector, and a linear mapping method is used to map the feature vectors into a prediction score.
Abstract:
Systems and methods for document analysis include identifying candidates in a corpus matching a requested expression. String kernel features are extracted for each candidate. Each candidate is classified according to the string kernel features using a machine learning model. A report is generated that identifies instances of the requested expression in the corpus that match a requested class.
Abstract:
A camera device and camera system for video-based workplace safety is provided. The camera device includes at least one imaging sensor configured to capture one or more video sequences in a workplace environment having a plurality of machines therein. The video camera further includes a processor. The processor is configured to generate a plurality of embedding vectors based on a plurality of observations. The observations include (i) a subject, (ii) an action taken by the subject, and (iii) an object on which the subject is taking the action on. The subject and object are constant. The processor is further configured to generate predictions of one or more future events based on one or more comparisons of at least some of the plurality of embedding vectors. The processor is configured to generate a signal for initiating an action to the at least one of the plurality of machines to mitigate harm.
Abstract:
Systems and method are disclosed for determining complex interactions among system inputs by using semi-Restricted Boltzmann Machines (RBMs) with factorized gated interactions of different orders to model complex interactions among system inputs; applying semi-RBMs to train a deep neural network with high-order within-layer interactions for learning a distance metric and a feature mapping; and tuning the deep neural network by minimizing margin violations between positive query document pairs and corresponding negative pairs.
Abstract:
Systems and methods are disclosed for representing a word by extracting n-dimensions for the word from an original language model; if the word has been previously processed, use values previously chosen to define an (n+m) dimensional vector and otherwise randomly selecting m values to define the (n+m) dimensional vector; and applying the (n+m) dimensional vector to represent words that are not well-represented in the language model.