Patent search ap:("BEIJING BAIDU NETCOM SCIENCE TECHNOLOGY CO. Page LTD.") AND inv:"Shuai CHEN"

1.

发明申请
MULTIMODAL DATA PROCESSING 有权

公开(公告)号：US20230010160A1

公开(公告)日：2023-01-12

申请号：US17945415

申请日：2022-09-15

Applicant: BEIJING BAIDU NETCOM SCIENCE TECHNOLOGY CO., LTD.

Inventor： Shuai CHEN , Qi WANG , Hu YANG , Feng HE , Zhifan FENG , Chunguang CHAI , Yong ZHU

IPC: G06V10/82 , G06V10/80 , G06N3/08

Abstract: Disclosed are a method for processing multimodal data using a neural network, a device, and a medium, and relates to the field of artificial intelligence and, in particular to multimodal data processing, video classification, and deep learning. The neural network includes: an input subnetwork configured to receive the multimodal data to output respective first features of a plurality of modalities; a plurality of cross-modal feature subnetworks, each of which is configured to receive respective first features of two corresponding modalities to output a cross-modal feature corresponding to the two modalities; a plurality of cross-modal fusion subnetworks, each of which is configured to receive at least one cross-modal feature corresponding to a corresponding target modality and other modalities to output a second feature of the target modality; and an output subnetwork configured to receive respective second features of the plurality of modalities to output a processing result of the multimodal data.

2.

发明申请
METHOD OF PROCESSING MULTIMEDIA DATA, DEVICE AND MEDIUM 有权

公开(公告)号：US20230115737A1

公开(公告)日：2023-04-13

申请号：US18080432

申请日：2022-12-13

Applicant: BEIJING BAIDU NETCOM SCIENCE TECHNOLOGY CO., LTD.

Inventor： Shuai CHEN , Qi WANG , Zhifan FENG , Chunguang CHAI , Yong ZHU

IPC: G06F16/483 , G06F16/43 , G06F18/25 , G06F18/22 , G06N5/02

Abstract: A method of processing multimedia data, a device, and a medium, which relates to a field of an artificial intelligence technology, in particular to fields of knowledge graph and deep learning. The method of processing the multimedia data includes: recognizing the multimedia data so as to obtain at least one key information of the multimedia data; querying a predetermined knowledge base according to the at least one key information, so as to determine a multimedia name associated with the at least one key information and an association degree between the multimedia name and the at least one key information; and determining, in the multimedia name, a name of the multimedia data based on a similarity between alternative multimedia data for the multimedia name and the multimedia data, in response to the association degree being less than a first threshold value.

3.

发明申请
METHOD FOR TRAINING IMAGE-TEXT MATCHING MODEL, COMPUTING DEVICE, AND STORAGE MEDIUM 有权

公开(公告)号：US20230005284A1

公开(公告)日：2023-01-05

申请号：US17943458

申请日：2022-09-13

Applicant: BEIJING BAIDU NETCOM SCIENCE TECHNOLOGY CO., LTD.

Inventor： Feng HE , Qi WANG , Hu YANG , Shuai CHEN , Zhifan FENG , Chunguang CHAI

IPC: G06V30/19 , G06F16/583

Abstract: A computer-implemented method is provided. The method includes: obtaining a sample text and a sample image corresponding to the sample text; labeling a true semantic tag for the sample text according to a first preset rule; obtaining a text feature representation of the sample text and a predicted semantic tag output by a text coding sub-model; obtaining an image feature representation of the sample image output by an image coding sub-model; calculating a first loss based on the true semantic tag and the predicted semantic tag; calculating a contrast loss based on the text feature representation of the sample text and the image feature representation of the sample image; adjusting parameters of the text coding sub-model based on the first loss and the contrast loss; and adjusting parameters of the image coding sub-model based on the contrast loss.

Patent Agency Ranking