-
公开(公告)号:US20220415071A1
公开(公告)日:2022-12-29
申请号:US17899712
申请日:2022-08-31
Inventor: Chengquan ZHANG , Pengyuan LV , Shanshan LIU , Meina QIAO , Yangliu XU , Liang WU , Jingtuo LIU , Junyu HAN , Errui DING , Jingdong WANG
IPC: G06V30/19 , G06V30/18 , G06T9/00 , G06V30/262 , G06N20/00
Abstract: The present disclosure provides a training method of a text recognition model, a text recognition method, and an apparatus, relating to the technical field of artificial intelligence, and specifically, to the technical field of deep learning and computer vision, which can be applied in scenarios such as optional character recognition, etc. The specific implementation solution is: performing mask prediction on visual features of an acquired sample image, to obtain a predicted visual feature; performing mask prediction on semantic features of acquired sample text, to obtain a predicted semantic feature, where the sample image includes text; determining a first loss value of the text of the sample image according to the predicted visual feature; determining a second loss value of the sample text according to the predicted semantic feature; training, according to the first loss value and the second loss value, to obtain the text recognition model.
-
公开(公告)号:US20220253631A1
公开(公告)日:2022-08-11
申请号:US17501221
申请日:2021-10-14
Inventor: Yulin LI , Ju HUANG , Qunyi XIE , Xiameng QIN , Chengquan ZHANG , Jingtuo LIU
Abstract: The present disclosure discloses an image processing method, an electronic device and a storage medium, and relates to the field of artificial intelligence technologies, and particularly to the fields of computer vision technologies, deep learning technologies, or the like. The image processing method includes: acquiring a multi-modal feature of each of at least one text region in an image, the multi-modal feature including features in plural dimensions; performing a global attention processing operation on the multi-modal feature of each text region to obtain a global attention feature of each text region; determining a category of each text region based on the global attention feature of each text region; and constructing structured information based on text content and the category of each text region.
-