Patent search ap:("BEIJING BAIDU NETCOM SCIENCE TECHNOLOGY CO. Page LTD.") AND inv:"Xiameng QIN"

11.

发明公开
METHOD OF TRAINING TEXT DETECTION MODEL, METHOD OF DETECTING TEXT, AND DEVICE 审中-公开

公开(公告)号：US20240265718A1

公开(公告)日：2024-08-08

申请号：US18041370

申请日：2022-04-22

Applicant: Beijing Baidu Netcom Science Technology Co., Ltd.

Inventor： Xiaoqiang ZHANG , Xiameng QIN , Chengquan ZHANG , Kun YAO

IPC: G06V30/19 , G06V10/77 , G06V10/82

CPC classification number: G06V30/19127 , G06V10/7715 , G06V10/82

Abstract: A method training a text detection model and a method of detecting a text. The training method includes: inputting a sample image into a text feature extraction sub-model of a text detection model to obtain a text feature of a text in the sample image, the sample image having a label indicating an actual position information and an actual category; inputting a predetermined text vector into a text encoding sub-model of the text detection model to obtain a text reference feature; inputting the text feature and the text reference feature into a decoding sub-model of the text detection model to obtain a text sequence vector; inputting the text sequence vector into an output sub-model of the text detection model to obtain a predicted position information and a predicted category; and training the text detection model based on the predicted and actual categories, the predicted and actual position information.

12.

发明申请
METHOD OF PROCESSING TASK, ELECTRONIC DEVICE, AND STORAGE MEDIUM 有权

公开(公告)号：US20230134615A1

公开(公告)日：2023-05-04

申请号：US18146839

申请日：2022-12-27

Applicant: Beijing Baidu Netcom Science Technology Co., Ltd.

Inventor： Qunyi XIE , Dongdong ZHANG , Xiameng QIN , Mengyi EN , Yangliu XU , Yi CHEN , Ju HUANG , Kun YAO

IPC: G06F9/48 , G06F40/205 , G06F9/50

Abstract: A method of processing a task, an electronic device, and a storage medium are provided, which relate to a field of artificial intelligence, in particular to fields of deep learning and computer vision, and may be applied to OCR optical character recognition and other scenarios. The method includes: parsing labeled data to be processed according to a task type identification, to obtain task labeled data, a tag information of the task labeled data is matched with the task type identification, and the task labeled data includes first task labeled data and second task labeled data; training a model using the first task labeled data, to obtain candidate models, the model is determined according to the task type identification; and determining a target model from the candidate models according to a performance evaluation result obtained by performing performance evaluation on the plurality of candidate models using the second task labeled data.

13.

发明申请
IMAGE PROCESSING METHOD, ELECTRONIC DEVICE AND STORAGE MEDIUM 有权

公开(公告)号：US20220253631A1

公开(公告)日：2022-08-11

申请号：US17501221

申请日：2021-10-14

Applicant: BEIJING BAIDU NETCOM SCIENCE TECHNOLOGY CO., LTD.

Inventor： Yulin LI , Ju HUANG , Qunyi XIE , Xiameng QIN , Chengquan ZHANG , Jingtuo LIU

IPC: G06K9/00 , G06K9/62 , G06N3/04 , G06F40/30

Abstract: The present disclosure discloses an image processing method, an electronic device and a storage medium, and relates to the field of artificial intelligence technologies, and particularly to the fields of computer vision technologies, deep learning technologies, or the like. The image processing method includes: acquiring a multi-modal feature of each of at least one text region in an image, the multi-modal feature including features in plural dimensions; performing a global attention processing operation on the multi-modal feature of each text region to obtain a global attention feature of each text region; determining a category of each text region based on the global attention feature of each text region; and constructing structured information based on text content and the category of each text region.

14.

发明申请
METHOD AND DEVICE FOR VISUAL QUESTION ANSWERING, COMPUTER APPARATUS AND MEDIUM 有权

公开(公告)号：US20210406468A1

公开(公告)日：2021-12-30

申请号：US17161466

申请日：2021-01-28

Applicant: Beijing Baidu Netcom Science Technology Co., Ltd.

Inventor： Xiameng QIN , Yulin LI , Qunyi XIE , Ju HUANG , Junyu HAN

IPC: G06F40/279 , G06N3/08 , G06N3/04 , G06F16/532 , G06F16/583 , G06K9/20 , G06K9/62 , G06K9/46

Abstract: The present disclosure provides a method for visual question answering, which relates to a field of computer vision and natural language processing. The method includes: acquiring an input image and an input question; constructing a Visual Graph based on the input image, wherein the Visual Graph comprises a Node Feature and an Edge Feature; updating the Node Feature by using the Node Feature and the Edge Feature to obtain an updated Visual Graph; determining a question feature based on the input question; fusing the updated Visual Graph and the question feature to obtain a fused feature; and generating a predicted answer for the input image and the input question based on the fused feature. The present disclosure further provides an apparatus for visual question answering, a computer device and a non-transitory computer-readable storage medium.

Patent Agency Ranking