Patent search ap:("BEIJING BAIDU NETCOM SCIENCE TECHNOLOGY CO. Page LTD.") AND inv:"Qi WANG"

1.

发明申请
METHOD FOR TRAINING IMAGE-TEXT MATCHING MODEL, COMPUTING DEVICE, AND STORAGE MEDIUM 有权

公开(公告)号：US20230005284A1

公开(公告)日：2023-01-05

申请号：US17943458

申请日：2022-09-13

Applicant: BEIJING BAIDU NETCOM SCIENCE TECHNOLOGY CO., LTD.

Inventor： Feng HE , Qi WANG , Hu YANG , Shuai CHEN , Zhifan FENG , Chunguang CHAI

IPC: G06V30/19 , G06F16/583

Abstract: A computer-implemented method is provided. The method includes: obtaining a sample text and a sample image corresponding to the sample text; labeling a true semantic tag for the sample text according to a first preset rule; obtaining a text feature representation of the sample text and a predicted semantic tag output by a text coding sub-model; obtaining an image feature representation of the sample image output by an image coding sub-model; calculating a first loss based on the true semantic tag and the predicted semantic tag; calculating a contrast loss based on the text feature representation of the sample text and the image feature representation of the sample image; adjusting parameters of the text coding sub-model based on the first loss and the contrast loss; and adjusting parameters of the image coding sub-model based on the contrast loss.

2.

发明申请
MULTIMODAL DATA PROCESSING 有权

公开(公告)号：US20230010160A1

公开(公告)日：2023-01-12

申请号：US17945415

申请日：2022-09-15

Applicant: BEIJING BAIDU NETCOM SCIENCE TECHNOLOGY CO., LTD.

Inventor： Shuai CHEN , Qi WANG , Hu YANG , Feng HE , Zhifan FENG , Chunguang CHAI , Yong ZHU

IPC: G06V10/82 , G06V10/80 , G06N3/08

Abstract: Disclosed are a method for processing multimodal data using a neural network, a device, and a medium, and relates to the field of artificial intelligence and, in particular to multimodal data processing, video classification, and deep learning. The neural network includes: an input subnetwork configured to receive the multimodal data to output respective first features of a plurality of modalities; a plurality of cross-modal feature subnetworks, each of which is configured to receive respective first features of two corresponding modalities to output a cross-modal feature corresponding to the two modalities; a plurality of cross-modal fusion subnetworks, each of which is configured to receive at least one cross-modal feature corresponding to a corresponding target modality and other modalities to output a second feature of the target modality; and an output subnetwork configured to receive respective second features of the plurality of modalities to output a processing result of the multimodal data.

3.

发明申请
METHOD FOR TRAINING CROSS-MODAL RETRIEVAL MODEL, ELECTRONIC DEVICE AND STORAGE MEDIUM 有权

公开(公告)号：US20220284246A1

公开(公告)日：2022-09-08

申请号：US17502385

申请日：2021-10-15

Applicant: BEIJING BAIDU NETCOM SCIENCE TECHNOLOGY CO., LTD.

Inventor： Feng HE , Qi WANG , Zhifan FENG , Hu YANG , Chunguang CHAI

IPC: G06K9/62

Abstract: The present disclosure discloses a method for training a cross-modal retrieval model, an electronic device and a storage medium, and relates to the field of computer technologies, and particularly to the field of artificial intelligence technologies, such as knowledge graph technologies, computer vision technologies, deep learning technologies, or the like. The method for training a cross-modal retrieval model includes: determining similarity of a cross-modal sample pair according to the cross-modal sample pair, the cross-modal sample pair including a sample of a first modal and a sample of a second modal, and the first modal being different from the second modal; determining a soft margin based on the similarity, and determining a soft margin loss function based on the soft margin; and determining a total loss function based on the soft margin loss function, and training a cross-modal retrieval model according to the total loss function.

4.

发明申请
METHOD OF PROCESSING MULTIMEDIA DATA, DEVICE AND MEDIUM 有权

公开(公告)号：US20230115737A1

公开(公告)日：2023-04-13

申请号：US18080432

申请日：2022-12-13

Applicant: BEIJING BAIDU NETCOM SCIENCE TECHNOLOGY CO., LTD.

Inventor： Shuai CHEN , Qi WANG , Zhifan FENG , Chunguang CHAI , Yong ZHU

IPC: G06F16/483 , G06F16/43 , G06F18/25 , G06F18/22 , G06N5/02

Abstract: A method of processing multimedia data, a device, and a medium, which relates to a field of an artificial intelligence technology, in particular to fields of knowledge graph and deep learning. The method of processing the multimedia data includes: recognizing the multimedia data so as to obtain at least one key information of the multimedia data; querying a predetermined knowledge base according to the at least one key information, so as to determine a multimedia name associated with the at least one key information and an association degree between the multimedia name and the at least one key information; and determining, in the multimedia name, a name of the multimedia data based on a similarity between alternative multimedia data for the multimedia name and the multimedia data, in response to the association degree being less than a first threshold value.

5.

发明申请
VIDEO CLASSIFICATION METHOD, ELECTRONIC DEVICE AND STORAGE MEDIUM 有权

公开(公告)号：US20220284218A1

公开(公告)日：2022-09-08

申请号：US17502173

申请日：2021-10-15

Applicant: BEIJING BAIDU NETCOM SCIENCE TECHNOLOGY CO., LTD.

Inventor： Hu YANG , Feng HE , Qi WANG , Zhifan FENG , Chunguang CHAI , Yong ZHU

IPC: G06K9/00 , G06K9/62 , G06K9/32 , G10L15/08

Abstract: The present disclosure discloses a video classification method, an electronic device and a storage medium, and relates to the field of computer technologies, and particularly to the field of artificial intelligence technologies, such as knowledge graph technologies, computer vision technologies, deep learning technologies, or the like. The video classification method includes: extracting a keyword in a video according to multi-modal information of the video; acquiring background knowledge corresponding to the keyword, and determining a text to be recognized according to the keyword and the background knowledge; and classifying the text to be recognized to obtain a class of the video.

6.

发明申请
VIDEO PROCESSING METHOD, ELECTRONIC DEVICE AND STORAGE MEDIUM 有权

公开(公告)号：US20220027634A1

公开(公告)日：2022-01-27

申请号：US17450158

申请日：2021-10-06

Applicant: BEIJING BAIDU NETCOM SCIENCE TECHNOLOGY CO., LTD.

Inventor： Qi WANG , Zhifan FENG , Hu YANG , Chunguang CHAI

IPC: G06K9/00 , G06K9/62 , G06N3/04

Abstract: A video processing method, an electronic device and a storage medium are provided, and relate to the field of artificial intelligence, and particularly relates to the fields of deep learning, model training, knowledge mapping, video processing and the like. The method includes: acquiring a plurality of first video frames, and performing fine-grained splitting on the plurality of first video frames to obtain a plurality of second video frames; performing feature encoding on the plurality of second video frames according to multi-mode information related to the plurality of second video frames, to obtain feature fusion information for characterizing fusion of the multi-mode information; and performing similarity matching on the plurality of second video frames according to the feature fusion information, and obtaining a target video according to a result of the similarity matching.

Search Results

Country/Region

Patent validity

Application date

Publication (announcement) day

applicant

The country/region where the applicant is located

Inventor

IPC

IPC Department

IPC class

IPC subclass

IPC group

IPC team

Appearance classification