-
公开(公告)号:US20250095393A1
公开(公告)日:2025-03-20
申请号:US18470778
申请日:2023-09-20
Applicant: ADOBE INC.
Inventor: Ziyan Yang , Kushal Kafle , Zhe Lin , Scott Cohen , Zhihong Ding
IPC: G06V20/70 , G06F40/205 , G06V10/25 , G06V10/774
Abstract: A method, apparatus, and non-transitory computer readable medium for image processing are described. Embodiments of the present disclosure obtain an image and an input text including a subject from the image and a location of the subject in the image. An image encoder encodes the image to obtain an image embedding. A text encoder encodes the input text to obtain a text embedding. An image processing apparatus based on the present disclosure generates an output text based on the image embedding and the text embedding. In some examples, the output text includes a relation of the subject to an object from the image and a location of the object in the image.