METHOD FOR TRAINING MULTI-MODAL DATA MATCHING DEGREE CALCULATION MODEL, METHOD FOR CALCULATING MULTI-MODAL DATA MATCHING DEGREE, AND RELATED APPARATUSES

    公开(公告)号:US20230215136A1

    公开(公告)日:2023-07-06

    申请号:US18113826

    申请日:2023-02-24

    CPC classification number: G06V10/761 G06V10/7715

    Abstract: The present disclosure provides a method and apparatus for training a multi-modal data matching degree calculation model, a method and apparatus for calculating a multi-modal data matching degree, an electronic device, a computer readable storage medium and a computer program product, and relates to the field of artificial intelligence technology such as deep learning, image processing and computer vision. The method comprises: acquiring first sample data and second sample data that are different in modalities; constructing a contrastive learning loss function comprising a semantic perplexity parameter, the semantic perplexity parameter being determined based on a semantic feature distance between the first sample data and the second sample data; and training, by using the contrastive learning loss function, an initial multi-modal data matching degree calculation model through a contrastive learning approach, to obtain a target multi-modal data matching degree calculation model.

    IMAGE STYLE TRANSFER
    3.
    发明申请

    公开(公告)号:US20250095250A1

    公开(公告)日:2025-03-20

    申请号:US18749438

    申请日:2024-06-20

    Abstract: A method is provided that includes: obtaining a reference image and a description text; extracting a text feature of the description text; and performing the following operations based on a pre-trained diffusion model to generate a target image: in each time step of the diffusion model: calculating a first cross-attention feature of a first image feature and the text feature; obtaining a second cross-attention feature of a second image feature of the reference image and the text feature; editing the first cross-attention feature based on the second cross-attention feature to obtain a third cross-attention feature; and generating a result image feature of the time step based on the third cross-attention feature and the text feature; and decoding a result image feature of a last time step to generate the target image.

Patent Agency Ranking