-
1.
公开(公告)号:US20240013558A1
公开(公告)日:2024-01-11
申请号:US18113266
申请日:2023-02-23
Inventor: Haoran WANG , Dongliang HE , Fu LI , Errui DING
IPC: G06V20/70 , G06V10/774 , G06V20/40 , G06F40/30 , G06F40/279
CPC classification number: G06V20/70 , G06V10/774 , G06V20/46 , G06F40/30 , G06F40/279
Abstract: There is provided cross-modal feature extraction, retrieval, and model training methods and apparatuses, and a medium, which relates to the field of artificial intelligence (AI) technologies, and specifically to fields of deep learning, image processing, and computer vision technologies. A specific implementation solution involves: acquiring to-be-processed data, the to-be-processed data corresponding to at least two types of first modalities; determining first data of a second modality in the to-be-processed data, the second modality being any of the types of the first modalities; performing semantic entity extraction on the first data to obtain semantic entities; and acquiring semantic coding features of the first data based on the first data and the semantic entities and by using a pre-trained cross-modal feature extraction model.
-
公开(公告)号:US20230215136A1
公开(公告)日:2023-07-06
申请号:US18113826
申请日:2023-02-24
Inventor: Haoran WANG , Dongliang HE , Fu LI , Errui DING
CPC classification number: G06V10/761 , G06V10/7715
Abstract: The present disclosure provides a method and apparatus for training a multi-modal data matching degree calculation model, a method and apparatus for calculating a multi-modal data matching degree, an electronic device, a computer readable storage medium and a computer program product, and relates to the field of artificial intelligence technology such as deep learning, image processing and computer vision. The method comprises: acquiring first sample data and second sample data that are different in modalities; constructing a contrastive learning loss function comprising a semantic perplexity parameter, the semantic perplexity parameter being determined based on a semantic feature distance between the first sample data and the second sample data; and training, by using the contrastive learning loss function, an initial multi-modal data matching degree calculation model through a contrastive learning approach, to obtain a target multi-modal data matching degree calculation model.
-
公开(公告)号:US20250095250A1
公开(公告)日:2025-03-20
申请号:US18749438
申请日:2024-06-20
Inventor: Haoran WANG , Zeke XIE , Yunfeng CAI , Mingming SUN
Abstract: A method is provided that includes: obtaining a reference image and a description text; extracting a text feature of the description text; and performing the following operations based on a pre-trained diffusion model to generate a target image: in each time step of the diffusion model: calculating a first cross-attention feature of a first image feature and the text feature; obtaining a second cross-attention feature of a second image feature of the reference image and the text feature; editing the first cross-attention feature based on the second cross-attention feature to obtain a third cross-attention feature; and generating a result image feature of the time step based on the third cross-attention feature and the text feature; and decoding a result image feature of a last time step to generate the target image.
-
-