-
公开(公告)号:US20240037939A1
公开(公告)日:2024-02-01
申请号:US18487183
申请日:2023-10-16
Applicant: ADOBE INC.
Inventor: Quan Hung TRAN , Long Thanh MAI , Zhe LIN , Zhuowan LI
IPC: G06V20/30 , G06F16/55 , G06F16/535 , G06F40/205 , G06V10/75 , G06F18/214 , G06V10/82
CPC classification number: G06V20/30 , G06F16/55 , G06F16/535 , G06F40/205 , G06V10/751 , G06F18/214 , G06V10/82
Abstract: A group captioning system includes computing hardware, software, and/or firmware components in support of the enhanced group captioning contemplated herein. In operation, the system generates a target embedding for a group of target images, as well as a reference embedding for a group of reference images. The system identifies information in-common between the group of target images and the group of reference images and removes the joint information from the target embedding and the reference embedding. The result is a contrastive group embedding that includes a contrastive target embedding and a contrastive reference embedding with which to construct a contrastive group embedding, which is then input to a model to obtain a group caption for the target group of images.
-
2.
公开(公告)号:US20210375277A1
公开(公告)日:2021-12-02
申请号:US16889669
申请日:2020-06-01
Applicant: Adobe Inc.
Inventor: Tuan Manh LAI , Trung BUI , Quan Hung TRAN
Abstract: A computer-implemented method is disclosed for determining one or more characteristics of a dialog between a computer system and user. The method may comprise receiving a system utterance comprising one or more tokens defining one or more words generated by the computer system; receiving a user utterance comprising one or more tokens defining one or more words uttered by a user in response to the system utterance, the system utterance and the user utterance forming a dialog context; receiving one or more utterance candidates comprising one or more tokens; for each utterance candidate, generating an input sequence combining the one or more tokens of each of the system utterance, the user utterance, and the utterance candidate; and for each utterance candidate, evaluating the generated input sequence with a model to determine a probability that the utterance candidate is relevant to the dialog context.
-