-
公开(公告)号:US20230177821A1
公开(公告)日:2023-06-08
申请号:US18063564
申请日:2022-12-08
Inventor: Qiming PENG , Bin LUO , Yuhui CAO , Shikun FENG , Yongfeng CHEN
CPC classification number: G06V10/82 , G06V30/19147 , G06V30/1444
Abstract: A neural network training method and a document image understanding method is provided. The neural network training method includes: acquiring text comprehensive features of a plurality of first texts in an original image; replacing at least one original region in the original image to obtain a sample image including a plurality of first regions and a ground truth label for indicating whether each first region is a replaced region; acquiring image comprehensive features of the plurality of first regions; inputting the text comprehensive features of the plurality of first texts and the image comprehensive features of the plurality of first regions into a neural network model together to obtain text representation features of the plurality of first texts; determining a predicted label based on the text representation features of the plurality of first texts; and training the neural network model based on the ground truth label and the predicted label.
-
2.
公开(公告)号:US20220382991A1
公开(公告)日:2022-12-01
申请号:US17883908
申请日:2022-08-09
Inventor: Qiming PENG , Bin LUO , Yuhui CAO , Shikun FENG , Yongfeng CHEN
IPC: G06F40/30 , G06V30/414 , G06V30/14
Abstract: The present disclosure provides a training method and apparatus for a document processing model, a device, a storage medium and a program, which relate to the field of artificial intelligence, and in particular, to technologies such as deep learning, natural language processing and text recognition. The specific implementation is: acquiring a first sample document; determining element features of a plurality of document elements in the first sample document and positions corresponding to M position types of each document element according to the first sample document; where the document element corresponds to a character or a document area in the first sample document; and performing training on a basic model according to the element features of the plurality of document elements and the positions corresponding to the M position types of each document element to obtain the document processing model.
-
公开(公告)号:US20210406619A1
公开(公告)日:2021-12-30
申请号:US17169112
申请日:2021-02-05
Inventor: Pengyuan LV , Xiaoqiang ZHANG , Shanshan LIU , Chengquan ZHANG , Qiming PENG , Sijin WU , Hua LU , Yongfeng CHEN
IPC: G06K9/72 , G06T7/70 , G06F40/30 , G06K9/46 , G06K9/00 , G06K9/32 , G06K9/20 , G06K9/62 , G06N20/00 , G06N5/04
Abstract: The present disclosure provides a method for visual question answering, which relates to fields of computer vision and natural language processing. The method includes: acquiring an input image and an input question; detecting visual information and position information of each of at least one text region in the input image; determining semantic information and attribute information of each of the at least one text region based on the visual information and the position information; determining a global feature of the input image based on the visual information, the position information, the semantic information, and the attribute information; determining a question feature based on the input question; and generating a predicted answer for the input image and the input question based on the global feature and the question feature. The present disclosure further provides a device for visual question answering, a computer device and a medium.
-
-