-
公开(公告)号:US20230386168A1
公开(公告)日:2023-11-30
申请号:US18192393
申请日:2023-03-29
Inventor: Yipeng SUN , Mengjun CHENG , Longchao WANG , Xiongwei ZHU , Kun YAO , Junyu HAN , Jingtuo LIU , Errui DING , Jingdong WANG , Haifeng Wang
IPC: G06V10/42 , G06F16/583 , H04N19/176
CPC classification number: G06V10/42 , G06F16/5846 , H04N19/176
Abstract: A pre-training method for a Vision and Scene Text Aggregation model includes: acquiring a sample image-text pair; extracting a sample scene text from a sample image; inputting a sample text into a text encoding network to obtain a sample text feature; inputting the sample image and an initial sample aggregation feature into a visual encoding subnetwork and inputting the initial sample aggregation feature and the sample scene text into a scene encoding subnetwork to obtain a global image feature of the sample image and a learned sample aggregation feature; and pre-training the Vision and Scene Text Aggregation model according to the sample text feature, the global image feature of the sample image, and the learned sample aggregation feature.
-
22.
公开(公告)号:US20230260306A1
公开(公告)日:2023-08-17
申请号:US17884264
申请日:2022-08-09
Inventor: Yuechen YU , Chengquan ZHANG , Kun YAO
IPC: G06V30/413 , G06V30/414 , G06V30/416 , G06V30/18
CPC classification number: G06V30/413 , G06V30/414 , G06V30/416 , G06V30/18143
Abstract: A method and an apparatus is provided for recognizing a document image, a storage medium and an electronic device, relates to the technical field of artificial intelligent recognition, particularly relates to the technical fields of deep learning and computer vision. The method includes that a document image to be recognized is transformed into an image feature map, where the document image at least includes at least one text box and text information including multiple characters; a first recognition content of the document image to be recognized is predicted based on the image feature map, the multiple characters and the text box; the document image to be recognized is recognized based on an optical character recognition algorithm to obtain a second recognition content; and the first recognition content is matched with the second recognition content to obtain a target recognition content.
-
23.
公开(公告)号:US20230215203A1
公开(公告)日:2023-07-06
申请号:US18168759
申请日:2023-02-14
Inventor: Pengyuan LV , Chengquan ZHANG , Shanshan LIU , Meina QIAO , Yangliu XU , Liang WU , Xiaoyan WANG , Kun YAO , Junyu Han , Errui DING , Jingdong WANG , Tian WU , Haifeng WANG
IPC: G06V30/19
CPC classification number: G06V30/19147 , G06V30/19167
Abstract: The present disclosure provides a character recognition model training method and apparatus, a character recognition method and apparatus, a device and a medium, relating to the technical field of artificial intelligence, and specifically to the technical fields of deep learning, image processing and computer vision, which can be applied to scenarios such as character detection and recognition technology. The specific implementing solution is: partitioning an untagged training sample into at least two sub-sample images; dividing the at least two sub-sample images into a first training set and a second training set; where the first training set includes a first sub-sample image with a visible attribute, and the second training set includes a second sub-sample image with an invisible attribute; performing self-supervised training on a to-be-trained encoder by taking the second training set as a tag of the first training set, to obtain a target encoder.
-
公开(公告)号:US20230206667A1
公开(公告)日:2023-06-29
申请号:US18147806
申请日:2022-12-29
Inventor: Pengyuan LV , Liang WU , Shanshan LIU , Meina QIAO , Chengquan ZHANG , Kun YAO , Junyu HAN
CPC classification number: G06V30/19127 , G06V30/16
Abstract: A method for recognizing text includes: obtaining a first feature map of an image; for each target feature unit, performing a feature enhancement process on a plurality of feature values of the target feature unit respectively based on the plurality of feature values of the target feature unit, in which the target feature unit is a feature unit in the first feature map along a feature enhancement direction; and performing a text recognition process on the image based on the first feature map after the feature enhancement process.
-
公开(公告)号:US20230134615A1
公开(公告)日:2023-05-04
申请号:US18146839
申请日:2022-12-27
Inventor: Qunyi XIE , Dongdong ZHANG , Xiameng QIN , Mengyi EN , Yangliu XU , Yi CHEN , Ju HUANG , Kun YAO
IPC: G06F9/48 , G06F40/205 , G06F9/50
Abstract: A method of processing a task, an electronic device, and a storage medium are provided, which relate to a field of artificial intelligence, in particular to fields of deep learning and computer vision, and may be applied to OCR optical character recognition and other scenarios. The method includes: parsing labeled data to be processed according to a task type identification, to obtain task labeled data, a tag information of the task labeled data is matched with the task type identification, and the task labeled data includes first task labeled data and second task labeled data; training a model using the first task labeled data, to obtain candidate models, the model is determined according to the task type identification; and determining a target model from the candidate models according to a performance evaluation result obtained by performing performance evaluation on the plurality of candidate models using the second task labeled data.
-
-
-
-