-
公开(公告)号:US20250095250A1
公开(公告)日:2025-03-20
申请号:US18749438
申请日:2024-06-20
Inventor: Haoran WANG , Zeke XIE , Yunfeng CAI , Mingming SUN
Abstract: A method is provided that includes: obtaining a reference image and a description text; extracting a text feature of the description text; and performing the following operations based on a pre-trained diffusion model to generate a target image: in each time step of the diffusion model: calculating a first cross-attention feature of a first image feature and the text feature; obtaining a second cross-attention feature of a second image feature of the reference image and the text feature; editing the first cross-attention feature based on the second cross-attention feature to obtain a third cross-attention feature; and generating a result image feature of the time step based on the third cross-attention feature and the text feature; and decoding a result image feature of a last time step to generate the target image.