-
公开(公告)号:US10565305B2
公开(公告)日:2020-02-18
申请号:US15817161
申请日:2017-11-17
Applicant: salesforce.com, inc.
Inventor: Jiasen Lu , Caiming Xiong , Richard Socher
IPC: G06K9/00 , G06F17/27 , G06K9/62 , G06K9/46 , G06F17/24 , G06K9/48 , G06K9/66 , G06N3/08 , G06N3/04
Abstract: The technology disclosed presents a novel spatial attention model that uses current hidden state information of a decoder long short-term memory (LSTM) to guide attention and to extract spatial image features for use in image captioning. The technology disclosed also presents a novel adaptive attention model for image captioning that mixes visual information from a convolutional neural network (CNN) and linguistic information from an LSTM. At each timestep, the adaptive attention model automatically decides how heavily to rely on the image, as opposed to the linguistic model, to emit the next caption word. The technology disclosed further adds a new auxiliary sentinel gate to an LSTM architecture and produces a sentinel LSTM (Sn-LSTM). The sentinel gate produces a visual sentinel at each timestep, which is an additional representation, derived from the LSTM's memory, of long and short term visual and linguistic information.
-
公开(公告)号:US20190251168A1
公开(公告)日:2019-08-15
申请号:US15974118
申请日:2018-05-08
Applicant: salesforce.com, inc.
Inventor: Bryan McCann , Nitish Shirish Keskar , Caiming Xiong , Richard Socher
Abstract: Approaches for multitask learning as question answering include an input layer for encoding a context and a question, a self-attention based transformer including an encoder and a decoder, a first bi-directional long-term short-term memory (biLSTM) for further encoding an output of the encoder, a long-term short-term memory (LSTM) for generating a context-adjusted hidden state from the output of the decoder and a hidden state, an attention network for generating first attention weights based on an output of the first biLSTM and an output of the LSTM, a vocabulary layer for generating a distribution over a vocabulary, a context layer for generating a distribution over the context, and a switch for generating a weighting between the distributions over the vocabulary and the context, generating a composite distribution based on the weighting, and selecting a word of an answer using the composite distribution.
-
公开(公告)号:US11749264B2
公开(公告)日:2023-09-05
申请号:US17088206
申请日:2020-11-03
Applicant: salesforce.com, inc.
Inventor: Chien-Sheng Wu , Chu Hong Hoi , Richard Socher , Caiming Xiong
CPC classification number: G10L15/1815 , G10L15/063 , G10L15/1822
Abstract: Embodiments described herein provide methods and systems for training task-oriented dialogue (TOD) language models. In some embodiments, a TOD language model may receive a TOD dataset including a plurality of dialogues and a model input sequence may be generated from the dialogues using a first token prefixed to each user utterance and a second token prefixed to each system response of the dialogues. In some embodiments, the first token or the second token may be randomly replaced with a mask token to generate a masked training sequence and a masked language modeling (MLM) loss may be computed using the masked training sequence. In some embodiments, the TOD language model may be updated based on the MLM loss.
-
54.
公开(公告)号:US11687588B2
公开(公告)日:2023-06-27
申请号:US16531343
申请日:2019-08-05
Applicant: salesforce.com, inc.
Inventor: Mingfei Gao , Richard Socher , Caiming Xiong
IPC: G06F16/735 , G06F16/73 , G06V10/82 , G06F16/74 , G06V20/40 , G06F17/10 , G06N3/08 , G06F40/47 , G06F18/21 , G06V10/44
CPC classification number: G06F16/735 , G06F16/73 , G06F17/10 , G06F18/2185 , G06F40/47 , G06N3/08 , G06V10/82 , G06V20/41 , G06V20/49 , G06V10/454 , G06V20/44 , G06V20/46
Abstract: Systems and methods are provided for weakly supervised natural language localization (WSNLL), for example, as implemented in a neural network or model. The WSNLL network is trained with long, untrimmed videos, i.e., videos that have not been temporally segmented or annotated. The WSNLL network or model defines or generates a video-sentence pair, which corresponds to a pairing of an untrimmed video with an input text sentence. According to some embodiments, the WSNLL network or model is implemented with a two-branch architecture, where one branch performs segment sentence alignment and the other one conducts segment selection. These methods and systems are specifically used to predict how a video proposal matches a text query using respective visual and text features.
-
公开(公告)号:US11631009B2
公开(公告)日:2023-04-18
申请号:US16051309
申请日:2018-07-31
Applicant: salesforce.com, inc.
Inventor: Xi Victoria Lin , Caiming Xiong , Richard Socher
IPC: G06N20/00 , G06N5/04 , G06N3/04 , G06N3/08 , G06F16/903 , G06F16/901
Abstract: Approaches for multi-hop knowledge graph reasoning with reward shaping include a system and method of training a system to search relational paths in a knowledge graph. The method includes identifying, using an reasoning module, a plurality of first outgoing links from a current node in a knowledge graph, masking, using the reasoning module, one or more links from the plurality of first outgoing links to form a plurality of second outgoing links, rewarding the reasoning module with a reward of one when a node corresponding to an observed answer is reached, and rewarding the reasoning module with a reward identified by a reward shaping network when a node not corresponding to an observed answer is reached. In some embodiments, the reward shaping network is pre-trained.
-
公开(公告)号:US11599721B2
公开(公告)日:2023-03-07
申请号:US17002562
申请日:2020-08-25
Applicant: salesforce.com, inc.
Inventor: Shiva Kumar Pentyala , Mridul Gupta , Ankit Chadha , Indira Iyer , Richard Socher
IPC: G06F40/253 , G10L15/19 , G06F40/30
Abstract: A natural language processing system that trains task models for particular natural language tasks programmatically generates additional utterances for inclusion in the training set, based on the existing utterances in the training set and the existing state of a task model as generated from the original (non-augmented) training set. More specifically, the training augmentation module 220 identifies specific textual units of utterances and generates variants of the utterances based on those identified units. The identification is based on determined importances of the textual units to the output of the task model, as well as on task rules that correspond to the natural language task for which the task model is being generated. The generation of the additional utterances improves the quality of the task model without the expense of manual labeling of utterances for training set inclusion.
-
公开(公告)号:US11580359B2
公开(公告)日:2023-02-14
申请号:US16664508
申请日:2019-10-25
Applicant: salesforce.com, inc.
Inventor: Stephen Joseph Merity , Caiming Xiong , James Bradbury , Richard Socher
IPC: G06N3/04 , G06N3/084 , G06F40/284 , G06N3/08 , G06N7/00
Abstract: The technology disclosed provides a so-called “pointer sentinel mixture architecture” for neural network sequence models that has the ability to either reproduce a token from a recent context or produce a token from a predefined vocabulary. In one implementation, a pointer sentinel-LSTM architecture achieves state of the art language modeling performance of 70.9 perplexity on the Penn Treebank dataset, while using far fewer parameters than a standard softmax LSTM.
-
公开(公告)号:US11487939B2
公开(公告)日:2022-11-01
申请号:US16549985
申请日:2019-08-23
Applicant: salesforce.com, inc.
Inventor: Tong Niu , Caiming Xiong , Richard Socher
IPC: G06F40/284 , G06N3/08 , H03M7/42 , H03M7/30 , G06F40/40
Abstract: Embodiments described herein provide a provide a fully unsupervised model for text compression. Specifically, the unsupervised model is configured to identify an optimal deletion path for each input sequence of texts (e.g., a sentence) and words from the input sequence are gradually deleted along the deletion path. To identify the optimal deletion path, the unsupervised model may adopt a pretrained bidirectional language model (BERT) to score each candidate deletion based on the average perplexity of the resulting sentence and performs a simple greedy look-ahead tree search to select the best deletion for each step.
-
公开(公告)号:US11409945B2
公开(公告)日:2022-08-09
申请号:US17027130
申请日:2020-09-21
Applicant: salesforce.com, inc.
Inventor: Bryan McCann , Caiming Xiong , Richard Socher
IPC: G06F40/126 , G06N3/08 , G06N3/04 , G06F40/30 , G06F40/47 , G06F40/205 , G06F40/289 , G06F40/44 , G06F40/58
Abstract: A system is provided for natural language processing. In some embodiments, the system includes an encoder for generating context-specific word vectors for at least one input sequence of words. The encoder is pre-trained using training data for performing a first natural language processing task. A neural network performs a second natural language processing task on the at least one input sequence of words using the context-specific word vectors. The first natural language process task is different from the second natural language processing task and the neural network is separately trained from the encoder. In some embodiments, the first natural processing task can be machine translation, and the second natural processing task can be one of sentiment analysis, question classification, entailment classification, and question answering.
-
公开(公告)号:US11276002B2
公开(公告)日:2022-03-15
申请号:US15926768
申请日:2018-03-20
Applicant: salesforce.com, inc.
Inventor: Nitish Shirish Keskar , Richard Socher
Abstract: Hybrid training of deep networks includes a multi-layer neural network. The training includes setting a current learning algorithm for the multi-layer neural network to a first learning algorithm. The training further includes iteratively applying training data to the neural network, determining a gradient for parameters of the neural network based on the applying of the training data, updating the parameters based on the current learning algorithm, and determining whether the current learning algorithm should be switched to a second learning algorithm based on the updating. The training further includes, in response to the determining that the current learning algorithm should be switched to a second learning algorithm, changing the current learning algorithm to the second learning algorithm and initializing a learning rate of the second learning algorithm based on the gradient and a step used by the first learning algorithm to update the parameters of the neural network.
-
-
-
-
-
-
-
-
-