Patent search ap:("Beijing Baidu Netcom Science Technology Co. Page Ltd.") AND inv:"Xiaoqiang ZHANG"

1.

发明公开
CHARACTER DETECTION METHOD AND APPARATUS , MODEL TRAINING METHOD AND APPARATUS, DEVICE AND STORAGE MEDIUM 审中-公开

公开(公告)号：US20230196805A1

公开(公告)日：2023-06-22

申请号：US18168089

申请日：2023-02-13

Applicant: BEIJING BAIDU NETCOM SCIENCE TECHNOLOGY CO., LTD.

Inventor： Ju HUANG , Xiaoqiang ZHANG , Xiameng QIN , Chengquan ZHANG , Kun YAO

IPC: G06V30/10 , G06T7/11 , G06V10/25 , G06V10/44

CPC classification number: G06V30/10 , G06T7/11 , G06V10/25 , G06V10/44

Abstract: The present disclosure provides a character detection method and apparatus, a model training method and apparatus, a device and a storage medium. The specific implementation is: acquiring a training sample, where the training sample includes a sample image and a marked image, and the marked image is an image obtained by marking a text instance in the sample image; inputting the sample image into a character detection model, to obtain segmented images and image types of the segmented images output by the character detection model, where the image type indicates that the segmented image includes a text instance, or the segmented image does not include a text instance; and adjusting a parameter of the character detection model according to the segmented images, the image types of the segmented images and the marked image.

2.

发明申请
IMAGE CLASSIFICATION METHOD, ELECTRONIC DEVICE AND STORAGE MEDIUM 有权

公开(公告)号：US20220027611A1

公开(公告)日：2022-01-27

申请号：US17498226

申请日：2021-10-11

Applicant: Beijing Baidu Netcom Science Technology Co., Ltd.

Inventor： Yuechen YU , Chengquan ZHANG , Yulin LI , Xiaoqiang ZHANG , Ju HUANG , Xiameng QIN , Kun YAO , Jingtuo LIU , Junyu HAN , Errui DING

IPC: G06K9/00 , G06K9/62 , G06N3/08

Abstract: Provided are an image classification method and apparatus, an electronic device and a storage medium, relating to the field of artificial intelligence and, in particular, to computer vision and deep learning. The method includes inputting a to-be-classified document image into a pretrained neural network and obtaining a feature submap of each text box of the to-be-classified document image by use of the neural network; inputting the feature submap of each text box, a semantic feature corresponding to preobtained text information of each text box and a position feature corresponding to preobtained position information of each text box into a pretrained multimodal feature fusion model and fusing, by use of the multimodal feature fusion model, the three into a multimodal feature corresponding to each text box; and classifying the to-be-classified document image based on the multimodal feature corresponding to each text box.

3.

发明公开
IMAGE-BASED INFORMATION EXTRACTION MODEL, METHOD, AND APPARATUS, DEVICE, AND STORAGE MEDIUM 审中-公开

公开(公告)号：US20240021000A1

公开(公告)日：2024-01-18

申请号：US18113178

申请日：2023-02-23

Applicant: BEIJING BAIDU NETCOM SCIENCE TECHNOLOGY CO., LTD.

Inventor： Xiameng QIN , Yulin LI , Xiaoqiang ZHANG , Ju HUANG , Qunyi XIE , Kun YAO

IPC: G06V30/19 , G06V30/148

CPC classification number: G06V30/1918 , G06V30/15 , G06V30/19127 , G06V30/19147

Abstract: There is provided an image-based information extraction model, method, and apparatus, a device, and a storage medium, which relates to the field of artificial intelligence (AI) technologies, specifically to fields of deep learning, image processing, computer vision technologies, and is applicable to optical character recognition (OCR) and other scenarios. A specific implementation solution involves: acquiring a to-be-extracted first image and a category of to-be-extracted information; and inputting the first image and the category into a pre-trained information extraction model to perform information extraction on the first image to obtain text information corresponding to the category.

4.

发明申请
METHOD AND APPARATUS FOR VISUAL QUESTION ANSWERING, COMPUTER DEVICE AND MEDIUM 有权

公开(公告)号：US20210406619A1

公开(公告)日：2021-12-30

申请号：US17169112

申请日：2021-02-05

Applicant: Beijing Baidu Netcom Science Technology Co., Ltd.

Inventor： Pengyuan LV , Xiaoqiang ZHANG , Shanshan LIU , Chengquan ZHANG , Qiming PENG , Sijin WU , Hua LU , Yongfeng CHEN

IPC: G06K9/72 , G06T7/70 , G06F40/30 , G06K9/46 , G06K9/00 , G06K9/32 , G06K9/20 , G06K9/62 , G06N20/00 , G06N5/04

Abstract: The present disclosure provides a method for visual question answering, which relates to fields of computer vision and natural language processing. The method includes: acquiring an input image and an input question; detecting visual information and position information of each of at least one text region in the input image; determining semantic information and attribute information of each of the at least one text region based on the visual information and the position information; determining a global feature of the input image based on the visual information, the position information, the semantic information, and the attribute information; determining a question feature based on the input question; and generating a predicted answer for the input image and the input question based on the global feature and the question feature. The present disclosure further provides a device for visual question answering, a computer device and a medium.

5.

发明公开
METHOD OF TRAINING TEXT DETECTION MODEL, METHOD OF DETECTING TEXT, AND DEVICE 审中-公开

公开(公告)号：US20240265718A1

公开(公告)日：2024-08-08

申请号：US18041370

申请日：2022-04-22

Applicant: Beijing Baidu Netcom Science Technology Co., Ltd.

Inventor： Xiaoqiang ZHANG , Xiameng QIN , Chengquan ZHANG , Kun YAO

IPC: G06V30/19 , G06V10/77 , G06V10/82

CPC classification number: G06V30/19127 , G06V10/7715 , G06V10/82

Abstract: A method training a text detection model and a method of detecting a text. The training method includes: inputting a sample image into a text feature extraction sub-model of a text detection model to obtain a text feature of a text in the sample image, the sample image having a label indicating an actual position information and an actual category; inputting a predetermined text vector into a text encoding sub-model of the text detection model to obtain a text reference feature; inputting the text feature and the text reference feature into a decoding sub-model of the text detection model to obtain a text sequence vector; inputting the text sequence vector into an output sub-model of the text detection model to obtain a predicted position information and a predicted category; and training the text detection model based on the predicted and actual categories, the predicted and actual position information.

Search Results

Country/Region

Patent validity

Application date

Publication (announcement) day

applicant

The country/region where the applicant is located

Inventor

IPC

IPC Department

IPC class

IPC subclass

IPC group

IPC team

Appearance classification