-
公开(公告)号:US20230215136A1
公开(公告)日:2023-07-06
申请号:US18113826
申请日:2023-02-24
Inventor: Haoran WANG , Dongliang HE , Fu LI , Errui DING
CPC classification number: G06V10/761 , G06V10/7715
Abstract: The present disclosure provides a method and apparatus for training a multi-modal data matching degree calculation model, a method and apparatus for calculating a multi-modal data matching degree, an electronic device, a computer readable storage medium and a computer program product, and relates to the field of artificial intelligence technology such as deep learning, image processing and computer vision. The method comprises: acquiring first sample data and second sample data that are different in modalities; constructing a contrastive learning loss function comprising a semantic perplexity parameter, the semantic perplexity parameter being determined based on a semantic feature distance between the first sample data and the second sample data; and training, by using the contrastive learning loss function, an initial multi-modal data matching degree calculation model through a contrastive learning approach, to obtain a target multi-modal data matching degree calculation model.
-
公开(公告)号:US20230008473A1
公开(公告)日:2023-01-12
申请号:US17944745
申请日:2022-09-14
Inventor: Xin LI , He ZHENG , Fanglong LIU , Dongliang HE
IPC: H04N19/159 , H04N19/182 , G06V20/40
Abstract: A video repairing method, apparatus, device, medium, and product are provided. The method includes: acquiring a to-be-repaired video frame sequence; determining a target category corresponding to each pixel in the to-be-repaired video frame sequence based on the to-be-repaired video frame sequence and a preset category detection model; determining, from the to-be-repaired video frame sequence, to-be-repaired pixels each with a target category being a to-be-repaired category; and performing repairing on to-be-repaired areas corresponding to the to-be-repaired pixels to obtain a target video frame sequence.
-
公开(公告)号:US20220319141A1
公开(公告)日:2022-10-06
申请号:US17845843
申请日:2022-06-21
Inventor: Fanglong LIU , Xin LI , Dongliang HE
IPC: G06V10/22 , G06T7/11 , H04N19/174
Abstract: A methods for processing an image, a device, and a storage medium are provided. The method may include: inputting a target image into a pre-trained image segmentation model, the target image including at least one sub-image; extracting high-level semantic features and low-level features of the target image through the image segmentation model, and determining target location information of the sub-image in the target image based on the high-level semantic features and the low-level features; and performing a preset processing operation on the sub-image, based on the target location information of the sub-image.
-
4.
公开(公告)号:US20240013558A1
公开(公告)日:2024-01-11
申请号:US18113266
申请日:2023-02-23
Inventor: Haoran WANG , Dongliang HE , Fu LI , Errui DING
IPC: G06V20/70 , G06V10/774 , G06V20/40 , G06F40/30 , G06F40/279
CPC classification number: G06V20/70 , G06V10/774 , G06V20/46 , G06F40/30 , G06F40/279
Abstract: There is provided cross-modal feature extraction, retrieval, and model training methods and apparatuses, and a medium, which relates to the field of artificial intelligence (AI) technologies, and specifically to fields of deep learning, image processing, and computer vision technologies. A specific implementation solution involves: acquiring to-be-processed data, the to-be-processed data corresponding to at least two types of first modalities; determining first data of a second modality in the to-be-processed data, the second modality being any of the types of the first modalities; performing semantic entity extraction on the first data to obtain semantic entities; and acquiring semantic coding features of the first data based on the first data and the semantic entities and by using a pre-trained cross-modal feature extraction model.
-
公开(公告)号:US20230130006A1
公开(公告)日:2023-04-27
申请号:US18145724
申请日:2022-12-22
Inventor: Dongliang HE , Errui DING , Haifeng WANG
IPC: G06V20/40 , G06V10/774 , G06V10/86 , G06F16/73 , G06F16/783
Abstract: The present application provides a method of processing a video, a method of querying a video, and a method of training a video processing model. A specific implementation solution of the method of processing the video includes: extracting, for a video to be processed, a plurality of video features under a plurality of receptive fields; extracting a local feature of the video to be processed according to a video feature under a target receptive field in the plurality of receptive fields; obtaining a global feature of the video to be processed according to a video feature under a largest receptive field in the plurality of receptive fields; and merging the local feature and the global feature to obtain a target feature of the video to be processed.
-
6.
公开(公告)号:US20230147550A1
公开(公告)日:2023-05-11
申请号:US18051594
申请日:2022-11-01
Inventor: Dongliang HE , Errui DING
IPC: G06V10/774 , G06V20/40 , G06F40/30 , G06V30/19
CPC classification number: G06V10/774 , G06V20/41 , G06F40/30 , G06V30/19147
Abstract: A method for pre-training a semantic representation model includes: for each video-text pair in pre-training data, determining a mask image sequence, a mask character sequence, and a mask image-character sequence of the video-text pair; determining a plurality of feature sequences and mask position prediction results respectively corresponding to the plurality of feature sequences by inputting the mask image sequence, the mask character sequence, and the mask image-character sequence into an initial semantic representation model; and building a loss function based on the plurality of feature sequences, the mask position prediction results respectively corresponding to the plurality of feature sequences and true mask position results, and adjusting coefficients of the semantic representation model to realize training.
-
-
-
-
-