UTILIZING INDIVIDUAL-CONCEPT TEXT-IMAGE ALIGNMENT TO ENHANCE COMPOSITIONAL CAPACITY OF TEXT-TO-IMAGE MODELS

    公开(公告)号:US20250078327A1

    公开(公告)日:2025-03-06

    申请号:US18457895

    申请日:2023-08-29

    Applicant: Adobe Inc.

    Abstract: The present disclosure relates to systems, methods, and non-transitory computer-readable media that utilize a text-image alignment loss to train a diffusion model to generate digital images from input text. In particular, in some embodiments, the disclosed systems generate a prompt noise representation form a text prompt with a first text concept and a second text concept using a denoising step of a diffusion neural network. Further, in some embodiments, the disclosed systems generate a first concept noise representation from the first text concept and a second concept noise representation from the second text concept. Moreover, the disclosed systems combine the first and second concept noise representation to generate a combined concept noise representation. Accordingly, in some embodiments, by comparing the combined concept noise representation and the prompt noise representation, the disclosed systems modify parameters of the diffusion neural network.

Patent Agency Ranking