-
公开(公告)号:US11775574B2
公开(公告)日:2023-10-03
申请号:US17182987
申请日:2021-02-23
Inventor: Yulin Li , Xiameng Qin , Ju Huang , Qunyi Xie , Junyu Han
IPC: G06F16/00 , G06F16/36 , G06F40/279 , G06F18/25 , G06V10/764 , G06V10/80 , G06V10/82 , G06V10/44 , G06V10/426 , G06N3/02
CPC classification number: G06F16/367 , G06F18/253 , G06F40/279 , G06V10/426 , G06V10/454 , G06V10/764 , G06V10/811 , G06V10/82 , G06N3/02
Abstract: A method for visual question answering, a computer device implementing the method and a medium for storing instructions on performing the method are provided. The method includes: acquiring an input image and an input question; constructing a visual graph based on the input image, wherein the visual graph comprises a first node feature and a first edge feature; constructing a question graph based on the input question, wherein the question graph comprises a second node feature and a second edge feature; performing a multimodal fusion on the visual graph and the question graph to obtain an updated visual graph and an updated question graph; determining a question feature based on the input question; determining a fusion feature based on the updated visual graph, the updated question graph and the question feature; and generating a predicted answer for the input image and the input question.
-
公开(公告)号:US20210390294A1
公开(公告)日:2021-12-16
申请号:US17139403
申请日:2020-12-31
Inventor: Xiangkai Huang , Qiaoyi LI , Yulin LI , Ju Huang , Duohao Qin , Xiameng Qin , Minghao Liu , Junyu Han , Jiangliang Guo
Abstract: Embodiments of the present disclosure disclose an image table extraction method and apparatus, an electronic device, a storage media, and a training method for a table extraction model, which relate to the field of artificial intelligence technologies and cloud computing technologies, including: acquiring an image to be processed;
generating a table of the image to be processed according to a table extraction model, where the table extraction model is obtained according to a field position feature, an image feature, and a text feature of a sample image; and filling text information of the image to be processed into the table.-
公开(公告)号:US20220148324A1
公开(公告)日:2022-05-12
申请号:US17581047
申请日:2022-01-21
Inventor: Xiameng QIN , Yulin Li , Ju Huang , Qunyi Xie , Chengquan Zhang , Kun Yao , Jingtuo Liu , Junyu Han
IPC: G06V30/18 , G06V30/24 , G06V30/148 , G06V30/19
Abstract: Provided are a method and apparatus for extracting information about a negotiable instrument, an electronic device and a storage medium. The method includes inputting a to-be-recognized negotiable instrument into a pretrained deep learning network and obtaining a visual image corresponding to the to-be-recognized negotiable instrument through the deep learning network;
matching the visual image corresponding to the to-be-recognized negotiable instrument with a visual image corresponding to each negotiable-instrument template in a preconstructed base template library; and in response to the visual image corresponding to the to-be-recognized negotiable instrument successfully matching a visual image corresponding to one negotiable-instrument template in the base template library, extracting structured information of the to-be-recognized negotiable instrument by using the negotiable-instrument template.-
4.
公开(公告)号:US11687704B2
公开(公告)日:2023-06-27
申请号:US17207179
申请日:2021-03-19
Inventor: Qiaoyi Li , Xiangkai Huang , Yulin Li , Ju Huang , Xiameng Qin , Duohao Qin , Minghao Liu , Junyu Han
CPC classification number: G06F40/174 , G06F16/93 , G06V30/19013 , G06V30/19173 , G06V30/40 , G06V30/10
Abstract: Disclosed are a method, apparatus and electronic device for annotating information of a structured document. A specific implementation is: obtaining a template image of a structured document and at least one piece of annotation information of a field to be filled in the template image, where the annotation information includes attribute value and historical content of the field to be filled, and historical position of the field to be filled in the template image; generating, according to the attribute value of the field to be filled, the historical content of the field to be filled and the historical position of the field to be filled in the template image, target filling information of the field to be filled; obtaining, according to the target filling information of the field to be filled, an image of an annotated structured document.
-
公开(公告)号:US11768876B2
公开(公告)日:2023-09-26
申请号:US17161466
申请日:2021-01-28
Inventor: Xiameng Qin , Yulin Li , Qunyi Xie , Ju Huang , Junyu Han
IPC: G06F16/9032 , G06F16/583 , G06F16/532 , G06F40/279 , G06N3/04 , G06N3/088 , G06F18/213 , G06F18/25 , G06V10/25 , G06V10/764 , G06V10/80 , G06V10/82 , G06V10/44
CPC classification number: G06F16/90332 , G06F16/532 , G06F16/583 , G06F18/213 , G06F18/253 , G06F40/279 , G06N3/04 , G06N3/088 , G06V10/25 , G06V10/454 , G06V10/764 , G06V10/806 , G06V10/82 , G06V2201/07
Abstract: The present disclosure provides a method for visual question answering, which relates to a field of computer vision and natural language processing. The method includes: acquiring an input image and an input question; constructing a Visual Graph based on the input image, wherein the Visual Graph comprises a Node Feature and an Edge Feature; updating the Node Feature by using the Node Feature and the Edge Feature to obtain an updated Visual Graph; determining a question feature based on the input question; fusing the updated Visual Graph and the question feature to obtain a fused feature; and generating a predicted answer for the input image and the input question based on the fused feature. The present disclosure further provides an apparatus for visual question answering, a computer device and a non-transitory computer-readable storage medium.
-
-
-
-