METHOD FOR TRAINING MULTI-MODAL DATA MATCHING DEGREE CALCULATION MODEL, METHOD FOR CALCULATING MULTI-MODAL DATA MATCHING DEGREE, AND RELATED APPARATUSES

    公开(公告)号:US20230215136A1

    公开(公告)日:2023-07-06

    申请号:US18113826

    申请日:2023-02-24

    CPC classification number: G06V10/761 G06V10/7715

    Abstract: The present disclosure provides a method and apparatus for training a multi-modal data matching degree calculation model, a method and apparatus for calculating a multi-modal data matching degree, an electronic device, a computer readable storage medium and a computer program product, and relates to the field of artificial intelligence technology such as deep learning, image processing and computer vision. The method comprises: acquiring first sample data and second sample data that are different in modalities; constructing a contrastive learning loss function comprising a semantic perplexity parameter, the semantic perplexity parameter being determined based on a semantic feature distance between the first sample data and the second sample data; and training, by using the contrastive learning loss function, an initial multi-modal data matching degree calculation model through a contrastive learning approach, to obtain a target multi-modal data matching degree calculation model.

    VIDEO REPAIRING METHODS, APPARATUS, DEVICE, MEDIUM AND PRODUCTS

    公开(公告)号:US20230008473A1

    公开(公告)日:2023-01-12

    申请号:US17944745

    申请日:2022-09-14

    Abstract: A video repairing method, apparatus, device, medium, and product are provided. The method includes: acquiring a to-be-repaired video frame sequence; determining a target category corresponding to each pixel in the to-be-repaired video frame sequence based on the to-be-repaired video frame sequence and a preset category detection model; determining, from the to-be-repaired video frame sequence, to-be-repaired pixels each with a target category being a to-be-repaired category; and performing repairing on to-be-repaired areas corresponding to the to-be-repaired pixels to obtain a target video frame sequence.

    METHOD FOR PROCESSING IMAGE, DEVICE AND STORAGE MEDIUM

    公开(公告)号:US20220319141A1

    公开(公告)日:2022-10-06

    申请号:US17845843

    申请日:2022-06-21

    Abstract: A methods for processing an image, a device, and a storage medium are provided. The method may include: inputting a target image into a pre-trained image segmentation model, the target image including at least one sub-image; extracting high-level semantic features and low-level features of the target image through the image segmentation model, and determining target location information of the sub-image in the target image based on the high-level semantic features and the low-level features; and performing a preset processing operation on the sub-image, based on the target location information of the sub-image.

    METHOD OF PROCESSING VIDEO, METHOD OF QUERING VIDEO, AND METHOD OF TRAINING MODEL

    公开(公告)号:US20230130006A1

    公开(公告)日:2023-04-27

    申请号:US18145724

    申请日:2022-12-22

    Abstract: The present application provides a method of processing a video, a method of querying a video, and a method of training a video processing model. A specific implementation solution of the method of processing the video includes: extracting, for a video to be processed, a plurality of video features under a plurality of receptive fields; extracting a local feature of the video to be processed according to a video feature under a target receptive field in the plurality of receptive fields; obtaining a global feature of the video to be processed according to a video feature under a largest receptive field in the plurality of receptive fields; and merging the local feature and the global feature to obtain a target feature of the video to be processed.

    METHOD AND APPARATUS FOR PRE-TRAINING SEMANTIC REPRESENTATION MODEL AND ELECTRONIC DEVICE

    公开(公告)号:US20230147550A1

    公开(公告)日:2023-05-11

    申请号:US18051594

    申请日:2022-11-01

    CPC classification number: G06V10/774 G06V20/41 G06F40/30 G06V30/19147

    Abstract: A method for pre-training a semantic representation model includes: for each video-text pair in pre-training data, determining a mask image sequence, a mask character sequence, and a mask image-character sequence of the video-text pair; determining a plurality of feature sequences and mask position prediction results respectively corresponding to the plurality of feature sequences by inputting the mask image sequence, the mask character sequence, and the mask image-character sequence into an initial semantic representation model; and building a loss function based on the plurality of feature sequences, the mask position prediction results respectively corresponding to the plurality of feature sequences and true mask position results, and adjusting coefficients of the semantic representation model to realize training.

Patent Agency Ranking