-
公开(公告)号:US11854283B2
公开(公告)日:2023-12-26
申请号:US17169112
申请日:2021-02-05
Inventor: Pengyuan Lv , Xiaoqiang Zhang , Shanshan Liu , Chengquan Zhang , Qiming Peng , Sijin Wu , Hua Lu , Yongfeng Chen
IPC: G06V30/262 , G06T7/70 , G06V30/413 , G06V20/62 , G06F16/33 , G06V30/19 , G06V10/82 , G06V30/416
CPC classification number: G06V30/274 , G06F16/3344 , G06T7/70 , G06V10/82 , G06V20/62 , G06V30/19173 , G06V30/413 , G06V30/416 , G06T2207/30176
Abstract: The present disclosure provides a method for visual question answering, which relates to fields of computer vision and natural language processing. The method includes: acquiring an input image and an input question; detecting visual information and position information of each of at least one text region in the input image; determining semantic information and attribute information of each of the at least one text region based on the visual information and the position information; determining a global feature of the input image based on the visual information, the position information, the semantic information, and the attribute information; determining a question feature based on the input question; and generating a predicted answer for the input image and the input question based on the global feature and the question feature. The present disclosure further provides a device for visual question answering, a computer device and a medium.
-
公开(公告)号:US20230222827A1
公开(公告)日:2023-07-13
申请号:US18181800
申请日:2023-03-10
Inventor: Wenjin Wang , Zhengjie Huang , Bin Luo , Qiming Peng , Weichong Yin , Shikun Feng , Shiwei Huang , Jingzhou He
IPC: G06V30/414 , G06V30/18 , G06F40/30 , G06F40/295
CPC classification number: G06V30/414 , G06F40/30 , G06F40/295 , G06V30/18143
Abstract: In a method for processing a document image, a document image to be processed is acquired. Text nodes of multiple granularities, visual nodes of multiple granularities, respective node information of the text nodes, and respective node information of the visual nodes in the document image are obtained. A multi-granularity and multi-modality document graph is construct based on the text nodes of multiple granularities, the visual nodes of multiple granularities, the respective node information of the text nodes and the respective node information of the visual nodes. Multi-granularity semantic feature information of the document image is determined based on the multi-granularity and multi-modality document graph, the respective node information of the text nodes and the respective node information of the visual nodes.
-