Patent search ap:("Beijing Baidu Netcom Science Technology Co. Page Ltd.") AND inv:"Zhifan FENG"

11.

发明申请
MULTIMODAL DATA PROCESSING 有权

公开(公告)号：US20230010160A1

公开(公告)日：2023-01-12

申请号：US17945415

申请日：2022-09-15

Applicant: BEIJING BAIDU NETCOM SCIENCE TECHNOLOGY CO., LTD.

Inventor： Shuai CHEN , Qi WANG , Hu YANG , Feng HE , Zhifan FENG , Chunguang CHAI , Yong ZHU

IPC: G06V10/82 , G06V10/80 , G06N3/08

Abstract: Disclosed are a method for processing multimodal data using a neural network, a device, and a medium, and relates to the field of artificial intelligence and, in particular to multimodal data processing, video classification, and deep learning. The neural network includes: an input subnetwork configured to receive the multimodal data to output respective first features of a plurality of modalities; a plurality of cross-modal feature subnetworks, each of which is configured to receive respective first features of two corresponding modalities to output a cross-modal feature corresponding to the two modalities; a plurality of cross-modal fusion subnetworks, each of which is configured to receive at least one cross-modal feature corresponding to a corresponding target modality and other modalities to output a second feature of the target modality; and an output subnetwork configured to receive respective second features of the plurality of modalities to output a processing result of the multimodal data.

12.

发明申请
METHOD FOR TRAINING CROSS-MODAL RETRIEVAL MODEL, ELECTRONIC DEVICE AND STORAGE MEDIUM 有权

公开(公告)号：US20220284246A1

公开(公告)日：2022-09-08

申请号：US17502385

申请日：2021-10-15

Applicant: BEIJING BAIDU NETCOM SCIENCE TECHNOLOGY CO., LTD.

Inventor： Feng HE , Qi WANG , Zhifan FENG , Hu YANG , Chunguang CHAI

IPC: G06K9/62

Abstract: The present disclosure discloses a method for training a cross-modal retrieval model, an electronic device and a storage medium, and relates to the field of computer technologies, and particularly to the field of artificial intelligence technologies, such as knowledge graph technologies, computer vision technologies, deep learning technologies, or the like. The method for training a cross-modal retrieval model includes: determining similarity of a cross-modal sample pair according to the cross-modal sample pair, the cross-modal sample pair including a sample of a first modal and a sample of a second modal, and the first modal being different from the second modal; determining a soft margin based on the similarity, and determining a soft margin loss function based on the soft margin; and determining a total loss function based on the soft margin loss function, and training a cross-modal retrieval model according to the total loss function.

Patent Agency Ranking