-
公开(公告)号:US20240346722A1
公开(公告)日:2024-10-17
申请号:US18423226
申请日:2024-01-25
Inventor: Taehyun OH , Hyunwoo HA , Sungbin KIM
Abstract: There is provided a method for training an image generating model that generates an image from an audio. The method includes selecting at least one frame from a video including a plurality of frames based on a correlation between an audio and an image of each frame; extracting image information and audio information from each of the selected at least one frame; and training an audio feature vector extracting model that extracts an audio feature vector from the audio information, wherein the audio feature vector is aligned within an embedding space with an image feature vector extracted from the image information by a pre-trained image feature vector extracting model.