-
公开(公告)号:US20210406619A1
公开(公告)日:2021-12-30
申请号:US17169112
申请日:2021-02-05
Inventor: Pengyuan LV , Xiaoqiang ZHANG , Shanshan LIU , Chengquan ZHANG , Qiming PENG , Sijin WU , Hua LU , Yongfeng CHEN
IPC: G06K9/72 , G06T7/70 , G06F40/30 , G06K9/46 , G06K9/00 , G06K9/32 , G06K9/20 , G06K9/62 , G06N20/00 , G06N5/04
Abstract: The present disclosure provides a method for visual question answering, which relates to fields of computer vision and natural language processing. The method includes: acquiring an input image and an input question; detecting visual information and position information of each of at least one text region in the input image; determining semantic information and attribute information of each of the at least one text region based on the visual information and the position information; determining a global feature of the input image based on the visual information, the position information, the semantic information, and the attribute information; determining a question feature based on the input question; and generating a predicted answer for the input image and the input question based on the global feature and the question feature. The present disclosure further provides a device for visual question answering, a computer device and a medium.
-
公开(公告)号:US20230177359A1
公开(公告)日:2023-06-08
申请号:US18063348
申请日:2022-12-08
Inventor: Sijin WU , Han LIU , Teng HU , Shikun FENG , Yongfeng CHEN
IPC: G06N5/022 , G06F40/174 , G06F40/205
CPC classification number: G06N5/022 , G06F40/174 , G06F40/205 , G06F40/30
Abstract: The present disclosure provides a method and apparatus for training a document information extraction model and method and apparatus for extracting document information, and relates to the field of artificial intelligence, and more particularly to the field of natural language processing. A specific implementation solution is: acquiring training data labeled with an answer corresponding to a preset question and a document information extraction model, the training data includes layout document training data and streaming document training data; extracting at least one feature from the training data; fusing at least one feature to obtain a fused feature; inputting the preset question, the fused feature and the training data into the document information extraction model to obtain a predicted result; and adjusting network parameters of the document information extraction model based on the predicted result and the answer.
-