Abstract:
Approaches for multitask learning as question answering include an input layer for encoding a context and a question, a self-attention based transformer including an encoder and a decoder, a first bi-directional long-term short-term memory (biLSTM) for further encoding an output of the encoder, a long-term short-term memory (LSTM) for generating a context-adjusted hidden state from the output of the decoder and a hidden state, an attention network for generating first attention weights based on an output of the first biLSTM and an output of the LSTM, a vocabulary layer for generating a distribution over a vocabulary, a context layer for generating a distribution over the context, and a switch for generating a weighting between the distributions over the vocabulary and the context, generating a composite distribution based on the weighting, and selecting a word of an answer using the composite distribution.
Abstract:
Approaches for interpretable counting for visual question answering include a digital image processor, a language processor, and a counter. The digital image processor identifies objects in an image, maps the identified objects into an embedding space, generates bounding boxes for each of the identified objects, and outputs the embedded objects paired with their bounding boxes. The language processor embeds a question into the embedding space. The scorer determines scores for the identified objects. Each respective score determines how well a corresponding one of the identified objects is responsive to the question. The counter determines a count of the objects in the digital image that are responsive to the question based on the scores. The count and a corresponding bounding box for each object included in the count are output. In some embodiments, the counter determines the count interactively based on interactions between counted and uncounted objects.
Abstract:
The technology disclosed presents a novel spatial attention model that uses current hidden state information of a decoder long short-term memory (LSTM) to guide attention and to extract spatial image features for use in image captioning. The technology disclosed also presents a novel adaptive attention model for image captioning that mixes visual information from a convolutional neural network (CNN) and linguistic information from an LSTM. At each timestep, the adaptive attention model automatically decides how heavily to rely on the image, as opposed to the linguistic model, to emit the next caption word. The technology disclosed further adds a new auxiliary sentinel gate to an LSTM architecture and produces a sentinel LSTM (Sn-LSTM). The sentinel gate produces a visual sentinel at each timestep, which is an additional representation, derived from the LSTM's memory, of long and short term visual and linguistic information.
Abstract:
A system includes a neural network for performing a first natural language processing task. The neural network includes a first rectifier linear unit capable of executing an activation function on a first input related to a first word sequence, and a second rectifier linear unit capable of executing an activation function on a second input related to a second word sequence. A first encoder is capable of receiving the result from the first rectifier linear unit and generating a first task specific representation relating to the first word sequence, and a second encoder is capable of receiving the result from the second rectifier linear unit and generating a second task specific representation relating to the second word sequence. A biattention mechanism is capable of computing, based on the first and second task specific representations, an interdependent representation related to the first and second word sequences. In some embodiments, the first natural processing task performed by the neural network is one of sentiment classification and entailment classification.
Abstract:
The technology disclosed proposes using a combination of computationally cheap, less-accurate bag of words (BoW) model and computationally expensive, more-accurate long short-term memory (LSTM) model to perform natural processing tasks such as sentiment analysis. The use of cheap, less-accurate BoW model is referred to herein as “skimming”. The use of expensive, more-accurate LSTM model is referred to herein as “reading”. The technology disclosed presents a probability-based guider (PBG). PBG combines the use of BoW model and the LSTM model. PBG uses a probability thresholding strategy to determine, based on the results of the BoW model, whether to invoke the LSTM model for reliably classifying a sentence as positive or negative. The technology disclosed also presents a deep neural network-based decision network (DDN) that is trained to learn the relationship between the BoW model and the LSTM model and to invoke only one of the two models.
Abstract:
Deep learning is applied to combined image and text analysis of messages that include images and text. A convolutional neural network is trained against the images and a recurrent neural network against the text. A classifier predicts human response to the message, including classifying reactions to the image, to the text, and overall to the message. Visualizations are provided of neural network analytic emphasis on parts of the images and text. Other types of media in messages can also be analyzed by a combination of specialized neural networks.
Abstract:
The technology disclosed uses a 3D deep convolutional neural network architecture (DCNNA) equipped with so-called subnetwork modules which perform dimensionality reduction operations on 3D radiological volume before the 3D radiological volume is subjected to computationally expensive operations. Also, the subnetworks convolve 3D data at multiple scales by subjecting the 3D data to parallel processing by different 3D convolutional layer paths. Such multi-scale operations are computationally cheaper than the traditional CNNs that perform serial convolutions. In addition, performance of the subnetworks is further improved through 3D batch normalization (BN) that normalizes the 3D input fed to the subnetworks, which in turn increases learning rates of the 3D DCNNA. After several layers of 3D convolution and 3D sub-sampling with 3D across a series of subnetwork modules, a feature map with reduced vertical dimensionality is generated from the 3D radiological volume and fed into one or more fully connected layers.
Abstract:
Systems and methods are provided for efficient off-policy credit assignment (ECA) in reinforcement learning. ECA allows principled credit assignment for off-policy samples, and therefore improves sample efficiency and asymptotic performance. One aspect of ECA is to formulate the optimization of expected return as approximate inference, where policy is approximating a learned prior distribution, which leads to a principled way of utilizing off-policy samples. Other features are also provided.
Abstract:
Approaches for private and interpretable machine learning systems include a system for processing a query. The system includes one or more teacher modules for receiving a query and generating a respective output, one or more privacy sanitization modules for privacy sanitizing the respective output of each of the one or more teacher modules, and a student module for receiving a query and the privacy sanitized respective output of each of the one or more teacher modules and generating a result. Each of the one or more teacher modules is trained using a respective private data set. The student module is trained using a public data set. In some embodiments, human understandable interpretations of an output from the student module is provided to a model user.
Abstract:
Approaches for multitask learning as question answering include a method for training that includes receiving a plurality of training samples including training samples from a plurality of task types, presenting the training samples to a neural model to generate an answer, determining an error between the generated answer and the natural language ground truth answer for each training sample presented, and adjusting parameters of the neural model based on the error. Each of the training samples includes a natural language context, question, and ground truth answer. An order in which the training samples are presented to the neural model includes initially selecting the training samples according to a first training strategy and switching to selecting the training samples according to a second training strategy. In some embodiments the first training strategy is a sequential training strategy and the second training strategy is a joint training strategy.