-
公开(公告)号:US20240304009A1
公开(公告)日:2024-09-12
申请号:US18179177
申请日:2023-03-06
Applicant: Adobe Inc.
Inventor: Seunghyun YOON , Trung BUI
CPC classification number: G06V20/70 , G06F40/58 , G06T1/0021
Abstract: Embodiments are disclosed for training an image caption evaluation system to perform evaluations of image captions. In particular, in one or more embodiments, the disclosed systems and methods comprise receiving a training image, a ground truth image caption for the training image, and a perturbed image caption for the training image, where the perturbed image caption includes modifications to the ground truth image caption. The disclosed systems and methods further comprise generating, by a visual encoder, a visual embedding representation of the training image and generating, by a perturbation-aware text encoder, a first text embedding for the ground truth image caption and a second text embedding for the perturbed image caption. The disclosed systems and methods further comprise computing losses between the visual embedding, the first text embedding, and the second text embedding and training the perturbation-aware text encoder based on the computed losses.