Patent search ap:("Adobe Inc.") AND inv:"Koustava Goswami" Page 1

1.

发明申请
TEXT-TO-IMAGE SYNTHESIS UTILIZING DIFFUSION MODELS WITH TEST-TIME ATTENTION SEGREGATION AND RETENTION OPTIMIZATION 有权

公开(公告)号：US20240428468A1

公开(公告)日：2024-12-26

申请号：US18337634

申请日：2023-06-20

Applicant: Adobe Inc.

Inventor： Aishwarya Agarwal , Srikrishna Karanam , Joseph Koonthanam Jose , Apoorv Umang Saxena , Koustava Goswami , Balaji Vasan Srinivasan

IPC: G06T11/00 , G06N3/0455

Abstract: The present disclosure relates to systems, methods, and non-transitory computer-readable media that utilizes attention segregation loss and/or attention retention loss at inference time of a diffusion neural network to generate a text-conditioned image. In particular, in some embodiments, the disclosed systems utilize the attention segregation loss to reduce overlap between concepts by comparing attention maps for multiple concepts of a text query corresponding to a denoising step. Further, in some embodiments, the disclosed systems utilize the attention retention loss to improve information retention for concepts across denoising steps by comparing attention maps between different denoising steps. Accordingly, in some embodiments, by utilizing the attention segregation loss and the attention retention loss, the disclosed systems accurately maintain multiple concepts from a text query when generating a text-conditioned image.

2.

发明申请
GENERATING TEXT PROMPTS FOR DIGITAL IMAGES UTILIZING VISION-LANGUAGE MODELS AND CONTEXTUAL PROMPT LEARNING 有权

公开(公告)号：US20250005296A1

公开(公告)日：2025-01-02

申请号：US18342954

申请日：2023-06-28

Applicant: Adobe Inc.

Inventor： Koustava Goswami , Srikrishna Karanam , Joseph Koonthanam Jose , Prateksha Udhayanan , Balaji Vasan Srinivasan

IPC: G06F40/40 , G06V10/77 , G06V10/82

Abstract: The present disclosure relates to systems, methods, and non-transitory computer-readable media that implements a vision language machine learning model to generate text representations of an input digital image from localized context tokens. In particular, in some embodiments, the disclosed systems generate image patch feature representations that represent patches from an input image. Further, in some embodiments, the disclosed systems generate localized context tokens from the image patch feature representations and prompt context tokens. Moreover, in some embodiments, by utilizing the localized context tokens, the disclosed systems generate a text representation by utilizing a text encoder of the vision language machine learning model.

Patent Agency Ranking