-
公开(公告)号:US20230020022A1
公开(公告)日:2023-01-19
申请号:US17885882
申请日:2022-08-11
Inventor: Shanshan LIU , Meina QIAO , Liang WU , Chengquan ZHANG , Kun YAO
Abstract: A method of recognizing a text, which relates to a field of an artificial intelligence technology, in particular to a field of computer vision and deep learning technology, and may be applied to optical character recognition or other applications. The method includes: acquiring a plurality of image sequences by continuously scanning a document; performing an image stitching, so as to obtain a plurality of successive frames of stitched images corresponding to the plurality of image sequences respectively, an overlapping region exists between each two successive frames of stitched images; performing a text recognition based on the plurality of successive frames of stitched images, so as to obtain a plurality of corresponding recognition results; and performing a de-duplication on the plurality of recognition results based on the overlapping region between each two successive frames of stitched images, so as to obtain a text recognition result for the document.
-
公开(公告)号:US20220392243A1
公开(公告)日:2022-12-08
申请号:US17890629
申请日:2022-08-18
Inventor: Shanshan LIU , Meina QIAO , Liang WU , Pengyuan LYU , Sen FAN , Chengquan ZHANG , Kun YAO
Abstract: A method for training a text classification model and an electronic device are provided. The method may include: acquiring a set of to-be-trained images, the set of to-be-trained images including at least one sample image; determining predicted position information and predicted attribute information of each text line in each sample image based on each sample image; and training to obtain the text classification model, based on the annotation position information and the annotation attribute information of each text line in each sample image, and the predicted position information and the predicted attribute information of each text line in each sample image, and the text classification model is used to detect attribute information of each text line in an to-be-recognized image.
-
公开(公告)号:US20230401828A1
公开(公告)日:2023-12-14
申请号:US17905965
申请日:2022-04-08
Inventor: Meina QIAO , Shanshan LIU , Xiameng QIN , Chengquan ZHANG , Kun YAO
IPC: G06V10/774 , G06V30/14 , G06V10/764
CPC classification number: G06V10/774 , G06V10/764 , G06V30/1444
Abstract: A method for training an image recognition model includes: obtaining a training data set, in which the training data set includes first text images of each vertical category in a non-target scene and second text images of each vertical category in a target scene, and a type of text content involved in the first text images is the same as a type of text content involved in the second text image; training an initial recognition model by using the first text images, to obtain a basic recognition model; and modifying the basic recognition model by using the second text images, to obtain an image recognition model corresponding to the target scene.
-
公开(公告)号:US20230186664A1
公开(公告)日:2023-06-15
申请号:US18169032
申请日:2023-02-14
Inventor: Shanshan LIU , Meina QIAO , Liang WU , Pengyuan LV , Sen FAN , Chengquan ZHANG , Kun YAO
CPC classification number: G06V30/19173 , G06V30/19147 , G06V30/30
Abstract: A method for text recognition is disclosed. The method includes obtaining a whole-image scenario for an image to be processed and a text image in the image to be processed. The method further includes determining a first text recognition model corresponding to the whole-image scenario. The method further includes performing text recognition on the text image according to the first text recognition model to obtain text information.
-
公开(公告)号:US20230050079A1
公开(公告)日:2023-02-16
申请号:US17974630
申请日:2022-10-27
Inventor: Pengyuan LV , Xiaoyan WANG , Liang WU , Shanshan LIU , Yuechen YU , Meina QIAO , Jie LU , Chengquan ZHANG , Kun YAO
IPC: G06V30/18 , G06V30/148
Abstract: Provided are a text recognition method, an electronic device, and a non-transitory computer-readable storage medium, which are applicable in an OCR scenario. In the particular solution, a text image to be recognized is acquired. Feature extraction is performed on the text image, to obtain an image feature corresponding to the text image, where a height-wise feature and a width-wise feature of the image feature each have a dimension greater than 1. According to the image feature, sampling features corresponding to multiple sampling points in the text image are determined. According to the sampling features corresponding to the multiple sampling points, a character recognition result corresponding to the text image is determined.
-
公开(公告)号:US20210406619A1
公开(公告)日:2021-12-30
申请号:US17169112
申请日:2021-02-05
Inventor: Pengyuan LV , Xiaoqiang ZHANG , Shanshan LIU , Chengquan ZHANG , Qiming PENG , Sijin WU , Hua LU , Yongfeng CHEN
IPC: G06K9/72 , G06T7/70 , G06F40/30 , G06K9/46 , G06K9/00 , G06K9/32 , G06K9/20 , G06K9/62 , G06N20/00 , G06N5/04
Abstract: The present disclosure provides a method for visual question answering, which relates to fields of computer vision and natural language processing. The method includes: acquiring an input image and an input question; detecting visual information and position information of each of at least one text region in the input image; determining semantic information and attribute information of each of the at least one text region based on the visual information and the position information; determining a global feature of the input image based on the visual information, the position information, the semantic information, and the attribute information; determining a question feature based on the input question; and generating a predicted answer for the input image and the input question based on the global feature and the question feature. The present disclosure further provides a device for visual question answering, a computer device and a medium.
-
7.
公开(公告)号:US20240282024A1
公开(公告)日:2024-08-22
申请号:US18041206
申请日:2022-04-22
Inventor: Liang WU , Shanshan LIU , Chengquan ZHANG , Kun YAO
IPC: G06T11/60 , G06F40/58 , G06N3/094 , G06T3/02 , G06V10/774
CPC classification number: G06T11/60 , G06F40/58 , G06N3/094 , G06T3/02 , G06V10/774
Abstract: A method of training a text erasure model, a method of display a translation, an electronic device, and a storage medium. The training method includes: processing a set of original text block images by using a generator of a generative adversarial network model to obtain a set of simulated text block-erased images; alternately training the generator and a discriminator of the generative adversarial network model by using a set of real text block-erased images and the set of simulated text block-erased images, so as to obtain a trained generator and a trained discriminator; and determining the trained generator as the text erasure model, wherein a pixel value of a text-erased region in a real text block-erased image contained in the set of real text block-erased images is determined based on a pixel value of another region in the real text block-erased image other than the text-erased region.
-
公开(公告)号:US20230215203A1
公开(公告)日:2023-07-06
申请号:US18168759
申请日:2023-02-14
Inventor: Pengyuan LV , Chengquan ZHANG , Shanshan LIU , Meina QIAO , Yangliu XU , Liang WU , Xiaoyan WANG , Kun YAO , Junyu Han , Errui DING , Jingdong WANG , Tian WU , Haifeng WANG
IPC: G06V30/19
CPC classification number: G06V30/19147 , G06V30/19167
Abstract: The present disclosure provides a character recognition model training method and apparatus, a character recognition method and apparatus, a device and a medium, relating to the technical field of artificial intelligence, and specifically to the technical fields of deep learning, image processing and computer vision, which can be applied to scenarios such as character detection and recognition technology. The specific implementing solution is: partitioning an untagged training sample into at least two sub-sample images; dividing the at least two sub-sample images into a first training set and a second training set; where the first training set includes a first sub-sample image with a visible attribute, and the second training set includes a second sub-sample image with an invisible attribute; performing self-supervised training on a to-be-trained encoder by taking the second training set as a tag of the first training set, to obtain a target encoder.
-
公开(公告)号:US20230206667A1
公开(公告)日:2023-06-29
申请号:US18147806
申请日:2022-12-29
Inventor: Pengyuan LV , Liang WU , Shanshan LIU , Meina QIAO , Chengquan ZHANG , Kun YAO , Junyu HAN
CPC classification number: G06V30/19127 , G06V30/16
Abstract: A method for recognizing text includes: obtaining a first feature map of an image; for each target feature unit, performing a feature enhancement process on a plurality of feature values of the target feature unit respectively based on the plurality of feature values of the target feature unit, in which the target feature unit is a feature unit in the first feature map along a feature enhancement direction; and performing a text recognition process on the image based on the first feature map after the feature enhancement process.
-
公开(公告)号:US20220415071A1
公开(公告)日:2022-12-29
申请号:US17899712
申请日:2022-08-31
Inventor: Chengquan ZHANG , Pengyuan LV , Shanshan LIU , Meina QIAO , Yangliu XU , Liang WU , Jingtuo LIU , Junyu HAN , Errui DING , Jingdong WANG
IPC: G06V30/19 , G06V30/18 , G06T9/00 , G06V30/262 , G06N20/00
Abstract: The present disclosure provides a training method of a text recognition model, a text recognition method, and an apparatus, relating to the technical field of artificial intelligence, and specifically, to the technical field of deep learning and computer vision, which can be applied in scenarios such as optional character recognition, etc. The specific implementation solution is: performing mask prediction on visual features of an acquired sample image, to obtain a predicted visual feature; performing mask prediction on semantic features of acquired sample text, to obtain a predicted semantic feature, where the sample image includes text; determining a first loss value of the text of the sample image according to the predicted visual feature; determining a second loss value of the sample text according to the predicted semantic feature; training, according to the first loss value and the second loss value, to obtain the text recognition model.
-
-
-
-
-
-
-
-
-