-
公开(公告)号:US20250078327A1
公开(公告)日:2025-03-06
申请号:US18457895
申请日:2023-08-29
Applicant: Adobe Inc.
Inventor: Zhipeng Bao , Yijun Li , Krishna Kumar Singh
Abstract: The present disclosure relates to systems, methods, and non-transitory computer-readable media that utilize a text-image alignment loss to train a diffusion model to generate digital images from input text. In particular, in some embodiments, the disclosed systems generate a prompt noise representation form a text prompt with a first text concept and a second text concept using a denoising step of a diffusion neural network. Further, in some embodiments, the disclosed systems generate a first concept noise representation from the first text concept and a second concept noise representation from the second text concept. Moreover, the disclosed systems combine the first and second concept noise representation to generate a combined concept noise representation. Accordingly, in some embodiments, by comparing the combined concept noise representation and the prompt noise representation, the disclosed systems modify parameters of the diffusion neural network.