-
公开(公告)号:US10958925B2
公开(公告)日:2021-03-23
申请号:US16687405
申请日:2019-11-18
Applicant: salesforce.com, inc.
Inventor: Yingbo Zhou , Luowei Zhou , Caiming Xiong , Richard Socher
IPC: H04N19/46 , H04N19/44 , H04N19/60 , H04N19/187 , H04N21/81 , H04N19/33 , H04N19/126 , H04N19/132 , H04N21/488
Abstract: Systems and methods for dense captioning of a video include a multi-layer encoder stack configured to receive information extracted from a plurality of video frames, a proposal decoder coupled to the encoder stack and configured to receive one or more outputs from the encoder stack, a masking unit configured to mask the one or more outputs from the encoder stack according to one or more outputs from the proposal decoder, and a decoder stack coupled to the masking unit and configured to receive the masked one or more outputs from the encoder stack. Generating the dense captioning based on one or more outputs of the decoder stack. In some embodiments, the one or more outputs from the proposal decoder include a differentiable mask. In some embodiments, during training, error in the dense captioning is back propagated to the decoder stack, the encoder stack, and the proposal decoder.
-
公开(公告)号:US20190130897A1
公开(公告)日:2019-05-02
申请号:US15878113
申请日:2018-01-23
Applicant: salesforce.com, inc.
Inventor: Yingbo Zhou , Caiming Xiong
Abstract: The disclosed technology teaches a deep end-to-end speech recognition model, including using multi-objective learning criteria to train a deep end-to-end speech recognition model on training data comprising speech samples temporally labeled with ground truth transcriptions. The multi-objective learning criteria updates model parameters of the model over one thousand to millions of backpropagation iterations by combining, at each iteration, a maximum likelihood objective function that modifies the model parameters to maximize a probability of outputting a correct transcription and a policy gradient function that modifies the model parameters to maximize a positive reward defined based on a non-differentiable performance metric which penalizes incorrect transcriptions in accordance with their conformity to corresponding ground truth transcriptions; and upon convergence after a final backpropagation iteration, persisting the modified model parameters learned by using the multi-objective learning criteria with the model to be applied to further end-to-end speech recognition.
-
公开(公告)号:US12164878B2
公开(公告)日:2024-12-10
申请号:US17581380
申请日:2022-01-21
Applicant: Salesforce.com, Inc.
Inventor: Tong Niu , Kazuma Hashimoto , Yingbo Zhou , Caiming Xiong
IPC: G06F40/51
Abstract: Embodiments described herein provide a cross-lingual sentence alignment framework that is trained only on rich-resource language pairs. To obtain an accurate aligner, a pretrained multi-lingual language model is used, and a classifier is trained on parallel data from rich-resource language pairs. This trained classifier may then be used for cross-lingual transfer with low-resource languages.
-
公开(公告)号:US11676022B2
公开(公告)日:2023-06-13
申请号:US17460691
申请日:2021-08-30
Applicant: salesforce.com, inc.
Inventor: Ehsan Hosseini-Asl , Caiming Xiong , Yingbo Zhou , Richard Socher
IPC: G05B13/02 , G10L21/003 , G10L15/07 , G10L15/065 , G06N3/02 , G06F18/21
CPC classification number: G05B13/027 , G06N3/02 , G10L21/003 , G06F18/2178 , G10L15/065 , G10L15/075
Abstract: A method for training parameters of a first domain adaptation model. The method includes evaluating a cycle consistency objective using a first task specific model associated with a first domain and a second task specific model associated with a second domain, and evaluating one or more first discriminator models to generate a first discriminator objective using the second task specific model. The one or more first discriminator models include a plurality of discriminators corresponding to a plurality of bands that corresponds domain variable ranges of the first and second domains respectively. The method further includes updating, based on the cycle consistency objective and the first discriminator objective, one or more parameters of the first domain adaptation model for adapting representations from the first domain to the second domain.
-
15.
公开(公告)号:US20230059870A1
公开(公告)日:2023-02-23
申请号:US17565305
申请日:2021-12-29
Applicant: salesforce.com, inc.
Inventor: Xi Ye , Semih Yavuz , Kazuma Hashimoto , Yingbo Zhou
Abstract: Embodiments described herein provide a question answering approach that answers a question by generating an executable logical form. First, a ranking model is used to select a set of good logical forms from a pool of logical forms obtained by searching over a knowledge graph. The selected logical forms are good in the sense that they are close to (or exactly match, in some cases) the intents in the question and final desired logical form. Next, a generation model is adopted conditioned on the question as well as the selected logical forms to generate the target logical form and execute it to obtain the final answer. For example, at inference stage, when a question is received, a matching logical form is identified from the question, based on which the final answer can be generated based on the node that is associated with the matching logical form in the knowledge base.
-
公开(公告)号:US20230054068A1
公开(公告)日:2023-02-23
申请号:US17589522
申请日:2022-01-31
Applicant: salesforce.com, inc.
Inventor: Haopeng Zheng , Semih Yavuz , Wojciech Kryscinski , Kazuma Hashimoto , Yingbo Zhou
IPC: G06F40/166 , G06F40/279 , G06F40/117 , G06N20/00
Abstract: Embodiments described herein provide document summarization systems and methods that utilize fine-tuning of pre-trained abstractive summarization models to produce summaries that more faithfully track the content of the documents. Such abstractive summarization models may be pre-trained using a corpus consisting of pairs of articles and associated summaries. For each article-summary pair, a pseudo label or control code is generated and represents a faithfulness of the summary with respect to the article. The pre-trained model is then fine-tuned based on the article-summary pairs and the corresponding control codes. The resulting fine-tuned models then provide improved faithfulness in document summarization tasks.
-
公开(公告)号:US20220101844A1
公开(公告)日:2022-03-31
申请号:US17037556
申请日:2020-09-29
Applicant: salesforce.com, inc.
Inventor: Xinyi Yang , Tian Xie , Caiming Xiong , Wenhao Liu , Huan Wang , Kazuma Hashimoto , Yingbo Zhou , Xugang Ye , Jin Qu , Feihong Wu
Abstract: A conversation engine performs conversations with users using chatbots customized for performing a set of tasks that can be performed using an online system. The conversation engine loads a chatbot configuration that specifies the behavior of a chatbot including the tasks that can be performed by the chatbot, the types of entities relevant to each task, and so on. The conversation may be voice based and use natural language. The conversation engine may load different chatbot configurations to implement different chatbots. The conversation engine receives a conversation engine configuration that specifies the behavior of the conversation engine across chatbots. The system may be a multi-tenant system that allows customization of the chatbots for each tenant.
-
18.
公开(公告)号:US20210357687A1
公开(公告)日:2021-11-18
申请号:US16931228
申请日:2020-07-16
Applicant: salesforce.com, inc.
Inventor: Mingfei Gao , Yingbo Zhou , Ran Xu , Caiming Xiong
Abstract: Embodiments described herein provide systems and methods for a partially supervised training model for online action detection. Specifically, the online action detection framework may include two modules that are trained jointly—a Temporal Proposal Generator (TPG) and an Online Action Recognizer (OAR). In the training phase, OAR performs both online per-frame action recognition and start point detection. At the same time, TPG generates class-wise temporal action proposals serving as noisy supervisions for OAR. TPG is then optimized with the video-level annotations. In this way, the online action detection framework can be trained with video-category labels only without pre-annotated segment-level boundary labels.
-
公开(公告)号:US11829721B2
公开(公告)日:2023-11-28
申请号:US17161214
申请日:2021-01-28
Applicant: salesforce.com, inc.
Inventor: Tong Niu , Semih Yavuz , Yingbo Zhou , Nitish Shirish Keskar , Huan Wang , Caiming Xiong
IPC: G10L15/065 , G06N3/0455 , G06F18/20 , G06F40/20 , G06F40/289 , G06F40/45 , G06F40/284 , G06F40/242 , G06F18/22 , G06F18/214 , G06N7/01
CPC classification number: G06F40/284 , G06F18/214 , G06F18/22 , G06F40/242 , G06N7/01
Abstract: Embodiments described herein provide dynamic blocking, a decoding algorithm which enables large-scale pretrained language models to generate high-quality paraphrases in an un-supervised setting. Specifically, in order to obtain an alternative surface form, when the language model emits a token that is present in the source sequence, the language model is prevented from generating the next token that is the same as the subsequent source token in the source sequence at the next time step. In this way, the language model is forced to generate a paraphrased sequence of the input source sequence, but with mostly different wording.
-
公开(公告)号:US20230153542A1
公开(公告)日:2023-05-18
申请号:US17581380
申请日:2022-01-21
Applicant: salesforce.com, inc.
Inventor: Tong Niu , Kazuma Hashimoto , Yingbo Zhou , Caiming Xiong
IPC: G06F40/51
CPC classification number: G06F40/51
Abstract: Embodiments described herein provide a cross-lingual sentence alignment framework that is trained only on rich-resource language pairs. To obtain an accurate aligner, a pretrained multi-lingual language model is used, and a classifier is trained on parallel data from rich-resource language pairs. This trained classifier may then be used for cross-lingual transfer with low-resource languages.
-
-
-
-
-
-
-
-
-