-
公开(公告)号:US20240386623A1
公开(公告)日:2024-11-21
申请号:US18477764
申请日:2023-09-29
Applicant: Salesforce, Inc.
Inventor: Ning YU , Can QIN , Shu ZHANG , Yihao FENG , Xinyi YANG , Ran XU
IPC: G06T11/00 , G06T5/20 , G06V10/771
Abstract: Embodiments described herein provide a method of image generation. The method includes a fixed diffusion model, and a trainable diffusion model. The fixed diffusion model may be pretrained on a large training corpus. The trainable diffusion model may be used to control the image generation of the fixed diffusion model by modifying internal representations of the fixed diffusion model. A task instruction may be provided in addition to a text prompt, and the task instruction may guide the trainable diffusion model together with the visual conditions. The visual conditions may be adapted according to the task instruction. During training, a fixed number of task instructions may be used. At inference, unseen task instructions may be used by combining convolutional kernels of the visual condition adapter.