-
公开(公告)号:US20230010160A1
公开(公告)日:2023-01-12
申请号:US17945415
申请日:2022-09-15
Inventor: Shuai CHEN , Qi WANG , Hu YANG , Feng HE , Zhifan FENG , Chunguang CHAI , Yong ZHU
Abstract: Disclosed are a method for processing multimodal data using a neural network, a device, and a medium, and relates to the field of artificial intelligence and, in particular to multimodal data processing, video classification, and deep learning. The neural network includes: an input subnetwork configured to receive the multimodal data to output respective first features of a plurality of modalities; a plurality of cross-modal feature subnetworks, each of which is configured to receive respective first features of two corresponding modalities to output a cross-modal feature corresponding to the two modalities; a plurality of cross-modal fusion subnetworks, each of which is configured to receive at least one cross-modal feature corresponding to a corresponding target modality and other modalities to output a second feature of the target modality; and an output subnetwork configured to receive respective second features of the plurality of modalities to output a processing result of the multimodal data.
-
公开(公告)号:US20220284246A1
公开(公告)日:2022-09-08
申请号:US17502385
申请日:2021-10-15
Inventor: Feng HE , Qi WANG , Zhifan FENG , Hu YANG , Chunguang CHAI
IPC: G06K9/62
Abstract: The present disclosure discloses a method for training a cross-modal retrieval model, an electronic device and a storage medium, and relates to the field of computer technologies, and particularly to the field of artificial intelligence technologies, such as knowledge graph technologies, computer vision technologies, deep learning technologies, or the like. The method for training a cross-modal retrieval model includes: determining similarity of a cross-modal sample pair according to the cross-modal sample pair, the cross-modal sample pair including a sample of a first modal and a sample of a second modal, and the first modal being different from the second modal; determining a soft margin based on the similarity, and determining a soft margin loss function based on the soft margin; and determining a total loss function based on the soft margin loss function, and training a cross-modal retrieval model according to the total loss function.
-