-
公开(公告)号:US11854283B2
公开(公告)日:2023-12-26
申请号:US17169112
申请日:2021-02-05
Inventor: Pengyuan Lv , Xiaoqiang Zhang , Shanshan Liu , Chengquan Zhang , Qiming Peng , Sijin Wu , Hua Lu , Yongfeng Chen
IPC: G06V30/262 , G06T7/70 , G06V30/413 , G06V20/62 , G06F16/33 , G06V30/19 , G06V10/82 , G06V30/416
CPC classification number: G06V30/274 , G06F16/3344 , G06T7/70 , G06V10/82 , G06V20/62 , G06V30/19173 , G06V30/413 , G06V30/416 , G06T2207/30176
Abstract: The present disclosure provides a method for visual question answering, which relates to fields of computer vision and natural language processing. The method includes: acquiring an input image and an input question; detecting visual information and position information of each of at least one text region in the input image; determining semantic information and attribute information of each of the at least one text region based on the visual information and the position information; determining a global feature of the input image based on the visual information, the position information, the semantic information, and the attribute information; determining a question feature based on the input question; and generating a predicted answer for the input image and the input question based on the global feature and the question feature. The present disclosure further provides a device for visual question answering, a computer device and a medium.