Patent search ap:("Beijing Baidu Netcom Science Technology Co. Page Ltd.") AND inv:"Longchao WANG"

1.

发明申请
Model Determination Method and Electronic Device 有权

公开(公告)号：US20230124389A1

公开(公告)日：2023-04-20

申请号：US17887690

申请日：2022-08-15

Applicant: Beijing Baidu Netcom Science Technology Co., Ltd.

Inventor： Longchao WANG , Yipeng SUN , Kun YAO , Junyu HAN , Jingtuo LIU , Errui DING

IPC: G06V10/70 , G06V10/774

Abstract: A model determination method and electronic device is provided, and relates to the technical field of artificial intelligence and, in particular, to the field of computer visions and deep learning, and can be applied to image processing, image identification and other scenarios. A specific implementation solution includes an image sample and a text sample are acquired, wherein text data in the text sample is used for performing text description to target image data in the image sample; at least one image feature in the image sample is stored to a first queue, and at least text feature in the text sample is stored to a second queue; the first queue and the second queue are trained to obtain a first target model; and the first target model is determined as an initialization model for a second target model.

2.

发明申请
METHOD FOR TRAINING IMAGE RECOGNITION MODEL BASED ON SEMANTIC ENHANCEMENT 有权

公开(公告)号：US20220392205A1

公开(公告)日：2022-12-08

申请号：US17892669

申请日：2022-08-22

Applicant: Beijing Baidu Netcom Science Technology Co., Ltd.

Inventor： Yipeng SUN , Rongqiao AN , Xiang WEI , Longchao WANG , Kun YAO , Junyu HAN , Jingtuo LIU , Errui DING

IPC: G06V10/80 , G06V10/77

Abstract: Embodiments of the present disclosure provide a method and apparatus for training an image recognition model based on a semantic enhancement, a method and apparatus for recognizing an image, an electronic device, and a computer readable storage medium. The method for training an image recognition model based on a semantic enhancement comprises: extracting, from an inputted first image being unannotated and having no textual description, a first feature representation of the first image; calculating a first loss function based on the first feature representation; extracting, from an inputted second image being unannotated and having an original textual description, a second feature representation of the second image; calculating a second loss function based on the second feature representation, and training an image recognition model based on a fusion of the first loss function and the second loss function.

3.

发明公开
PRE-TRAINING METHOD, IMAGE AND TEXT RETRIEVAL METHOD FOR A VISION AND SCENE TEXT AGGREGATION MODEL, ELECTRONIC DEVICE, AND STORAGE MEDIUM 审中-公开

公开(公告)号：US20230386168A1

公开(公告)日：2023-11-30

申请号：US18192393

申请日：2023-03-29

Applicant: Beijing Baidu Netcom Science Technology Co., Ltd.

Inventor： Yipeng SUN , Mengjun CHENG , Longchao WANG , Xiongwei ZHU , Kun YAO , Junyu HAN , Jingtuo LIU , Errui DING , Jingdong WANG , Haifeng Wang

IPC: G06V10/42 , G06F16/583 , H04N19/176

CPC classification number: G06V10/42 , G06F16/5846 , H04N19/176

Abstract: A pre-training method for a Vision and Scene Text Aggregation model includes: acquiring a sample image-text pair; extracting a sample scene text from a sample image; inputting a sample text into a text encoding network to obtain a sample text feature; inputting the sample image and an initial sample aggregation feature into a visual encoding subnetwork and inputting the initial sample aggregation feature and the sample scene text into a scene encoding subnetwork to obtain a global image feature of the sample image and a learned sample aggregation feature; and pre-training the Vision and Scene Text Aggregation model according to the sample text feature, the global image feature of the sample image, and the learned sample aggregation feature.

Patent Agency Ranking