-
公开(公告)号:US20240404283A1
公开(公告)日:2024-12-05
申请号:US18328597
申请日:2023-06-02
Applicant: Adobe Inc.
Inventor: Zhaowen WANG , Trung BUI , Bo HE
IPC: G06V20/40 , G06F40/166 , G06F40/40 , G06V10/774 , G06V10/776 , G06V10/80
Abstract: A method includes receiving a video input and a text transcription of the video input. The video input includes a plurality of frames and the text transcription includes a plurality of sentences. The method further includes determining, by a multimodal summarization model, a subset of key frames of the plurality of frames and a subset of key sentences of the plurality of sentences. The method further includes providing a summary of the video input and a summary of the text transcription based on the subset of key frames and the subset of key sentences.
-
公开(公告)号:US20240304009A1
公开(公告)日:2024-09-12
申请号:US18179177
申请日:2023-03-06
Applicant: Adobe Inc.
Inventor: Seunghyun YOON , Trung BUI
CPC classification number: G06V20/70 , G06F40/58 , G06T1/0021
Abstract: Embodiments are disclosed for training an image caption evaluation system to perform evaluations of image captions. In particular, in one or more embodiments, the disclosed systems and methods comprise receiving a training image, a ground truth image caption for the training image, and a perturbed image caption for the training image, where the perturbed image caption includes modifications to the ground truth image caption. The disclosed systems and methods further comprise generating, by a visual encoder, a visual embedding representation of the training image and generating, by a perturbation-aware text encoder, a first text embedding for the ground truth image caption and a second text embedding for the perturbed image caption. The disclosed systems and methods further comprise computing losses between the visual embedding, the first text embedding, and the second text embedding and training the perturbation-aware text encoder based on the computed losses.
-
3.
公开(公告)号:US20210375277A1
公开(公告)日:2021-12-02
申请号:US16889669
申请日:2020-06-01
Applicant: Adobe Inc.
Inventor: Tuan Manh LAI , Trung BUI , Quan Hung TRAN
Abstract: A computer-implemented method is disclosed for determining one or more characteristics of a dialog between a computer system and user. The method may comprise receiving a system utterance comprising one or more tokens defining one or more words generated by the computer system; receiving a user utterance comprising one or more tokens defining one or more words uttered by a user in response to the system utterance, the system utterance and the user utterance forming a dialog context; receiving one or more utterance candidates comprising one or more tokens; for each utterance candidate, generating an input sequence combining the one or more tokens of each of the system utterance, the user utterance, and the utterance candidate; and for each utterance candidate, evaluating the generated input sequence with a model to determine a probability that the utterance candidate is relevant to the dialog context.
-
-