-
1.
公开(公告)号:US20230289402A1
公开(公告)日:2023-09-14
申请号:US18055393
申请日:2022-11-14
Inventor: Jian WANG , Xiangbo SU , Qiman WU , Zhigang WANG , Hao SUN , Errui DING , Jingdong WANG , Tian WU , Haifeng WANG
IPC: G06K9/62
CPC classification number: G06K9/62 , G06K9/6288
Abstract: Provided are a joint perception model training method, a joint perception method, a device, and a storage medium. The joint perception model training method includes: acquiring sample images and perception tags of the sample images; acquiring a preset joint perception model, where the joint perception model includes a feature extraction network and a joint perception network; performing feature extraction on the sample images through the feature extraction network to obtain target sample features; performing joint perception through the joint perception network according to the target sample features to obtain perception prediction results; and training the preset joint perception model according to the perception prediction results and the perception tags, where the joint perception includes executing at least two perception tasks.
-
公开(公告)号:US20230386168A1
公开(公告)日:2023-11-30
申请号:US18192393
申请日:2023-03-29
Inventor: Yipeng SUN , Mengjun CHENG , Longchao WANG , Xiongwei ZHU , Kun YAO , Junyu HAN , Jingtuo LIU , Errui DING , Jingdong WANG , Haifeng Wang
IPC: G06V10/42 , G06F16/583 , H04N19/176
CPC classification number: G06V10/42 , G06F16/5846 , H04N19/176
Abstract: A pre-training method for a Vision and Scene Text Aggregation model includes: acquiring a sample image-text pair; extracting a sample scene text from a sample image; inputting a sample text into a text encoding network to obtain a sample text feature; inputting the sample image and an initial sample aggregation feature into a visual encoding subnetwork and inputting the initial sample aggregation feature and the sample scene text into a scene encoding subnetwork to obtain a global image feature of the sample image and a learned sample aggregation feature; and pre-training the Vision and Scene Text Aggregation model according to the sample text feature, the global image feature of the sample image, and the learned sample aggregation feature.
-
公开(公告)号:US20230215203A1
公开(公告)日:2023-07-06
申请号:US18168759
申请日:2023-02-14
Inventor: Pengyuan LV , Chengquan ZHANG , Shanshan LIU , Meina QIAO , Yangliu XU , Liang WU , Xiaoyan WANG , Kun YAO , Junyu Han , Errui DING , Jingdong WANG , Tian WU , Haifeng WANG
IPC: G06V30/19
CPC classification number: G06V30/19147 , G06V30/19167
Abstract: The present disclosure provides a character recognition model training method and apparatus, a character recognition method and apparatus, a device and a medium, relating to the technical field of artificial intelligence, and specifically to the technical fields of deep learning, image processing and computer vision, which can be applied to scenarios such as character detection and recognition technology. The specific implementing solution is: partitioning an untagged training sample into at least two sub-sample images; dividing the at least two sub-sample images into a first training set and a second training set; where the first training set includes a first sub-sample image with a visible attribute, and the second training set includes a second sub-sample image with an invisible attribute; performing self-supervised training on a to-be-trained encoder by taking the second training set as a tag of the first training set, to obtain a target encoder.
-
公开(公告)号:US20220415071A1
公开(公告)日:2022-12-29
申请号:US17899712
申请日:2022-08-31
Inventor: Chengquan ZHANG , Pengyuan LV , Shanshan LIU , Meina QIAO , Yangliu XU , Liang WU , Jingtuo LIU , Junyu HAN , Errui DING , Jingdong WANG
IPC: G06V30/19 , G06V30/18 , G06T9/00 , G06V30/262 , G06N20/00
Abstract: The present disclosure provides a training method of a text recognition model, a text recognition method, and an apparatus, relating to the technical field of artificial intelligence, and specifically, to the technical field of deep learning and computer vision, which can be applied in scenarios such as optional character recognition, etc. The specific implementation solution is: performing mask prediction on visual features of an acquired sample image, to obtain a predicted visual feature; performing mask prediction on semantic features of acquired sample text, to obtain a predicted semantic feature, where the sample image includes text; determining a first loss value of the text of the sample image according to the predicted visual feature; determining a second loss value of the sample text according to the predicted semantic feature; training, according to the first loss value and the second loss value, to obtain the text recognition model.
-
-
-