-
公开(公告)号:US20240265718A1
公开(公告)日:2024-08-08
申请号:US18041370
申请日:2022-04-22
Inventor: Xiaoqiang ZHANG , Xiameng QIN , Chengquan ZHANG , Kun YAO
CPC classification number: G06V30/19127 , G06V10/7715 , G06V10/82
Abstract: A method training a text detection model and a method of detecting a text. The training method includes: inputting a sample image into a text feature extraction sub-model of a text detection model to obtain a text feature of a text in the sample image, the sample image having a label indicating an actual position information and an actual category; inputting a predetermined text vector into a text encoding sub-model of the text detection model to obtain a text reference feature; inputting the text feature and the text reference feature into a decoding sub-model of the text detection model to obtain a text sequence vector; inputting the text sequence vector into an output sub-model of the text detection model to obtain a predicted position information and a predicted category; and training the text detection model based on the predicted and actual categories, the predicted and actual position information.
-
公开(公告)号:US20230134615A1
公开(公告)日:2023-05-04
申请号:US18146839
申请日:2022-12-27
Inventor: Qunyi XIE , Dongdong ZHANG , Xiameng QIN , Mengyi EN , Yangliu XU , Yi CHEN , Ju HUANG , Kun YAO
IPC: G06F9/48 , G06F40/205 , G06F9/50
Abstract: A method of processing a task, an electronic device, and a storage medium are provided, which relate to a field of artificial intelligence, in particular to fields of deep learning and computer vision, and may be applied to OCR optical character recognition and other scenarios. The method includes: parsing labeled data to be processed according to a task type identification, to obtain task labeled data, a tag information of the task labeled data is matched with the task type identification, and the task labeled data includes first task labeled data and second task labeled data; training a model using the first task labeled data, to obtain candidate models, the model is determined according to the task type identification; and determining a target model from the candidate models according to a performance evaluation result obtained by performing performance evaluation on the plurality of candidate models using the second task labeled data.
-
公开(公告)号:US20220253631A1
公开(公告)日:2022-08-11
申请号:US17501221
申请日:2021-10-14
Inventor: Yulin LI , Ju HUANG , Qunyi XIE , Xiameng QIN , Chengquan ZHANG , Jingtuo LIU
Abstract: The present disclosure discloses an image processing method, an electronic device and a storage medium, and relates to the field of artificial intelligence technologies, and particularly to the fields of computer vision technologies, deep learning technologies, or the like. The image processing method includes: acquiring a multi-modal feature of each of at least one text region in an image, the multi-modal feature including features in plural dimensions; performing a global attention processing operation on the multi-modal feature of each text region to obtain a global attention feature of each text region; determining a category of each text region based on the global attention feature of each text region; and constructing structured information based on text content and the category of each text region.
-
公开(公告)号:US20210406468A1
公开(公告)日:2021-12-30
申请号:US17161466
申请日:2021-01-28
Inventor: Xiameng QIN , Yulin LI , Qunyi XIE , Ju HUANG , Junyu HAN
IPC: G06F40/279 , G06N3/08 , G06N3/04 , G06F16/532 , G06F16/583 , G06K9/20 , G06K9/62 , G06K9/46
Abstract: The present disclosure provides a method for visual question answering, which relates to a field of computer vision and natural language processing. The method includes: acquiring an input image and an input question; constructing a Visual Graph based on the input image, wherein the Visual Graph comprises a Node Feature and an Edge Feature; updating the Node Feature by using the Node Feature and the Edge Feature to obtain an updated Visual Graph; determining a question feature based on the input question; fusing the updated Visual Graph and the question feature to obtain a fused feature; and generating a predicted answer for the input image and the input question based on the fused feature. The present disclosure further provides an apparatus for visual question answering, a computer device and a non-transitory computer-readable storage medium.
-
-
-