Patent search ap:("BEIJING BAIDU NETCOM SCIENCE TECHNOLOGY CO. Page LTD.") AND inv:"Ju HUANG"

1.

发明申请
METHOD AND PLATFORM OF GENERATING DOCUMENT, ELECTRONIC DEVICE AND STORAGE MEDIUM 有权

公开(公告)号：US20230048495A1

公开(公告)日：2023-02-16

申请号：US17974183

申请日：2022-10-26

Applicant: BEIJING BAIDU NETCOM SCIENCE TECHNOLOGY CO., LTD.

Inventor： Qunyi XIE , Xiameng QIN , Mengyi EN , Dongdong ZHANG , Ju HUANG , Yangliu XU , Yi CHEN , Kun YAO

IPC: G06V30/413 , G06V10/764 , G06V10/24 , G06V10/75 , G06V30/414

Abstract: A method and a platform of generating a document, an electronic device, and a storage medium are provided, which relate to a field of an artificial intelligence technology, in particular to fields of computer vision and deep learning technologies, and may be applied to a text recognition scenario and other scenarios. The method includes: performing a category recognition on a document picture to obtain a target category result; determining a target structured model matched with the target category result; and performing, by using the target structured model, a structure recognition on the document picture to obtain a structure recognition result, so as to generate an electronic document based on the structure recognition result, wherein the structure recognition result includes a field attribute recognition result and a field position recognition result.

2.

发明申请
METHOD FOR TRAINING TEXT POSITIONING MODEL AND METHOD FOR TEXT POSITIONING 有权

公开(公告)号：US20220392242A1

公开(公告)日：2022-12-08

申请号：US17819838

申请日：2022-08-15

Applicant: BEIJING BAIDU NETCOM SCIENCE TECHNOLOGY CO., LTD.

Inventor： Ju HUANG , Yulin LI , Peng WANG , Qunyi XIE , Xiameng QIN , Kun YAO

IPC: G06V30/19 , G06V30/14

Abstract: A method for training a text positioning model includes: obtaining a sample image, where the sample image contains a sample text to be positioned and a text marking box for the sample text; inputting the sample image into a text positioning model to be trained to position the sample text, and outputting a prediction text box for the sample image; obtaining a sample prior anchor box corresponding to the sample image; and adjusting model parameters of the text positioning model based on the sample prior anchor box, the text marking box and the prediction text box, and continuing training the adjusted text positioning model based on a next sample image until model training is completed, to generate a target text positioning model.

3.

发明公开
IMAGE-BASED INFORMATION EXTRACTION MODEL, METHOD, AND APPARATUS, DEVICE, AND STORAGE MEDIUM 审中-公开

公开(公告)号：US20240021000A1

公开(公告)日：2024-01-18

申请号：US18113178

申请日：2023-02-23

Applicant: BEIJING BAIDU NETCOM SCIENCE TECHNOLOGY CO., LTD.

Inventor： Xiameng QIN , Yulin LI , Xiaoqiang ZHANG , Ju HUANG , Qunyi XIE , Kun YAO

IPC: G06V30/19 , G06V30/148

CPC classification number: G06V30/1918 , G06V30/15 , G06V30/19127 , G06V30/19147

Abstract: There is provided an image-based information extraction model, method, and apparatus, a device, and a storage medium, which relates to the field of artificial intelligence (AI) technologies, specifically to fields of deep learning, image processing, computer vision technologies, and is applicable to optical character recognition (OCR) and other scenarios. A specific implementation solution involves: acquiring a to-be-extracted first image and a category of to-be-extracted information; and inputting the first image and the category into a pre-trained information extraction model to perform information extraction on the first image to obtain text information corresponding to the category.

4.

发明公开
CHARACTER DETECTION METHOD AND APPARATUS , MODEL TRAINING METHOD AND APPARATUS, DEVICE AND STORAGE MEDIUM 审中-公开

公开(公告)号：US20230196805A1

公开(公告)日：2023-06-22

申请号：US18168089

申请日：2023-02-13

Applicant: BEIJING BAIDU NETCOM SCIENCE TECHNOLOGY CO., LTD.

Inventor： Ju HUANG , Xiaoqiang ZHANG , Xiameng QIN , Chengquan ZHANG , Kun YAO

IPC: G06V30/10 , G06T7/11 , G06V10/25 , G06V10/44

CPC classification number: G06V30/10 , G06T7/11 , G06V10/25 , G06V10/44

Abstract: The present disclosure provides a character detection method and apparatus, a model training method and apparatus, a device and a storage medium. The specific implementation is: acquiring a training sample, where the training sample includes a sample image and a marked image, and the marked image is an image obtained by marking a text instance in the sample image; inputting the sample image into a character detection model, to obtain segmented images and image types of the segmented images output by the character detection model, where the image type indicates that the segmented image includes a text instance, or the segmented image does not include a text instance; and adjusting a parameter of the character detection model according to the segmented images, the image types of the segmented images and the marked image.

5.

发明申请
IMAGE CLASSIFICATION METHOD, ELECTRONIC DEVICE AND STORAGE MEDIUM 有权

公开(公告)号：US20220027611A1

公开(公告)日：2022-01-27

申请号：US17498226

申请日：2021-10-11

Applicant: Beijing Baidu Netcom Science Technology Co., Ltd.

Inventor： Yuechen YU , Chengquan ZHANG , Yulin LI , Xiaoqiang ZHANG , Ju HUANG , Xiameng QIN , Kun YAO , Jingtuo LIU , Junyu HAN , Errui DING

IPC: G06K9/00 , G06K9/62 , G06N3/08

Abstract: Provided are an image classification method and apparatus, an electronic device and a storage medium, relating to the field of artificial intelligence and, in particular, to computer vision and deep learning. The method includes inputting a to-be-classified document image into a pretrained neural network and obtaining a feature submap of each text box of the to-be-classified document image by use of the neural network; inputting the feature submap of each text box, a semantic feature corresponding to preobtained text information of each text box and a position feature corresponding to preobtained position information of each text box into a pretrained multimodal feature fusion model and fusing, by use of the multimodal feature fusion model, the three into a multimodal feature corresponding to each text box; and classifying the to-be-classified document image based on the multimodal feature corresponding to each text box.

6.

发明申请
METHOD AND APPARATUS FOR VISUAL QUESTION ANSWERING, COMPUTER DEVICE AND MEDIUM 有权

公开(公告)号：US20210406592A1

公开(公告)日：2021-12-30

申请号：US17182987

申请日：2021-02-23

Applicant: Beijing Baidu Netcom Science Technology Co., Ltd.

Inventor： Yulin LI , Xiameng QIN , Ju HUANG , Qunyi XIE , Junyu HAN

IPC: G06K9/62 , G06K9/46 , G06F40/279

Abstract: The present disclosure provides a method for visual question answering. The method includes: acquiring an input image and an input question; constructing a visual graph based on the input image, wherein the visual graph comprises a first node feature and a first edge feature; constructing a question graph based on the input question, wherein the question graph comprises a second node feature and a second edge feature; performing a multimodal fusion on the visual graph and the question graph to obtain an updated visual graph and an updated question graph; determining a question feature based on the input question; determining a fusion feature based on the updated visual graph, the updated question graph and the question feature; and generating a predicted answer for the input image and the input question. The present disclosure further provides an apparatus for visual question answering, a computer device and a medium.

7.

发明申请
METHOD OF PROCESSING TASK, ELECTRONIC DEVICE, AND STORAGE MEDIUM 有权

公开(公告)号：US20230134615A1

公开(公告)日：2023-05-04

申请号：US18146839

申请日：2022-12-27

Applicant: Beijing Baidu Netcom Science Technology Co., Ltd.

Inventor： Qunyi XIE , Dongdong ZHANG , Xiameng QIN , Mengyi EN , Yangliu XU , Yi CHEN , Ju HUANG , Kun YAO

IPC: G06F9/48 , G06F40/205 , G06F9/50

Abstract: A method of processing a task, an electronic device, and a storage medium are provided, which relate to a field of artificial intelligence, in particular to fields of deep learning and computer vision, and may be applied to OCR optical character recognition and other scenarios. The method includes: parsing labeled data to be processed according to a task type identification, to obtain task labeled data, a tag information of the task labeled data is matched with the task type identification, and the task labeled data includes first task labeled data and second task labeled data; training a model using the first task labeled data, to obtain candidate models, the model is determined according to the task type identification; and determining a target model from the candidate models according to a performance evaluation result obtained by performing performance evaluation on the plurality of candidate models using the second task labeled data.

8.

发明申请
IMAGE PROCESSING METHOD, ELECTRONIC DEVICE AND STORAGE MEDIUM 有权

公开(公告)号：US20220253631A1

公开(公告)日：2022-08-11

申请号：US17501221

申请日：2021-10-14

Applicant: BEIJING BAIDU NETCOM SCIENCE TECHNOLOGY CO., LTD.

Inventor： Yulin LI , Ju HUANG , Qunyi XIE , Xiameng QIN , Chengquan ZHANG , Jingtuo LIU

IPC: G06K9/00 , G06K9/62 , G06N3/04 , G06F40/30

Abstract: The present disclosure discloses an image processing method, an electronic device and a storage medium, and relates to the field of artificial intelligence technologies, and particularly to the fields of computer vision technologies, deep learning technologies, or the like. The image processing method includes: acquiring a multi-modal feature of each of at least one text region in an image, the multi-modal feature including features in plural dimensions; performing a global attention processing operation on the multi-modal feature of each text region to obtain a global attention feature of each text region; determining a category of each text region based on the global attention feature of each text region; and constructing structured information based on text content and the category of each text region.

9.

发明申请
METHOD AND DEVICE FOR VISUAL QUESTION ANSWERING, COMPUTER APPARATUS AND MEDIUM 有权

公开(公告)号：US20210406468A1

公开(公告)日：2021-12-30

申请号：US17161466

申请日：2021-01-28

Applicant: Beijing Baidu Netcom Science Technology Co., Ltd.

Inventor： Xiameng QIN , Yulin LI , Qunyi XIE , Ju HUANG , Junyu HAN

IPC: G06F40/279 , G06N3/08 , G06N3/04 , G06F16/532 , G06F16/583 , G06K9/20 , G06K9/62 , G06K9/46

Abstract: The present disclosure provides a method for visual question answering, which relates to a field of computer vision and natural language processing. The method includes: acquiring an input image and an input question; constructing a Visual Graph based on the input image, wherein the Visual Graph comprises a Node Feature and an Edge Feature; updating the Node Feature by using the Node Feature and the Edge Feature to obtain an updated Visual Graph; determining a question feature based on the input question; fusing the updated Visual Graph and the question feature to obtain a fused feature; and generating a predicted answer for the input image and the input question based on the fused feature. The present disclosure further provides an apparatus for visual question answering, a computer device and a non-transitory computer-readable storage medium.

Search Results

Country/Region

Patent validity

Application date

Publication (announcement) day

applicant

The country/region where the applicant is located

Inventor

IPC

IPC Department

IPC class

IPC subclass

IPC group

IPC team

Appearance classification