METHOD FOR TRAINING TEXT POSITIONING MODEL AND METHOD FOR TEXT POSITIONING

    公开(公告)号:US20220392242A1

    公开(公告)日:2022-12-08

    申请号:US17819838

    申请日:2022-08-15

    Abstract: A method for training a text positioning model includes: obtaining a sample image, where the sample image contains a sample text to be positioned and a text marking box for the sample text; inputting the sample image into a text positioning model to be trained to position the sample text, and outputting a prediction text box for the sample image; obtaining a sample prior anchor box corresponding to the sample image; and adjusting model parameters of the text positioning model based on the sample prior anchor box, the text marking box and the prediction text box, and continuing training the adjusted text positioning model based on a next sample image until model training is completed, to generate a target text positioning model.

    METHOD AND APPARATUS FOR CORRECTING DISTORTED DOCUMENT IMAGE

    公开(公告)号:US20210192696A1

    公开(公告)日:2021-06-24

    申请号:US17151783

    申请日:2021-01-19

    Abstract: Embodiments of the present disclosure provide a method and apparatus for correcting a distorted document image, where the method for correcting a distorted document image includes: obtaining a distorted document image; and inputting the distorted document image into a correction model, and obtaining a corrected image corresponding to the distorted document image; where the correction model is a model obtained by training with a set of image samples as inputs and a corrected image corresponding to each image sample in the set of image samples as an output, and the image samples are distorted. By inputting the distorted document image to be corrected into the correction model, the corrected image corresponding to the distorted document image can be obtained through the correction model, which realizes document image correction end-to-end, improves accuracy of the document image correction, and extends application scenarios of the document image correction.

    IMAGE CLASSIFICATION METHOD, ELECTRONIC DEVICE AND STORAGE MEDIUM

    公开(公告)号:US20220027611A1

    公开(公告)日:2022-01-27

    申请号:US17498226

    申请日:2021-10-11

    Abstract: Provided are an image classification method and apparatus, an electronic device and a storage medium, relating to the field of artificial intelligence and, in particular, to computer vision and deep learning. The method includes inputting a to-be-classified document image into a pretrained neural network and obtaining a feature submap of each text box of the to-be-classified document image by use of the neural network; inputting the feature submap of each text box, a semantic feature corresponding to preobtained text information of each text box and a position feature corresponding to preobtained position information of each text box into a pretrained multimodal feature fusion model and fusing, by use of the multimodal feature fusion model, the three into a multimodal feature corresponding to each text box; and classifying the to-be-classified document image based on the multimodal feature corresponding to each text box.

    METHOD AND APPARATUS FOR VISUAL QUESTION ANSWERING, COMPUTER DEVICE AND MEDIUM

    公开(公告)号:US20210406592A1

    公开(公告)日:2021-12-30

    申请号:US17182987

    申请日:2021-02-23

    Abstract: The present disclosure provides a method for visual question answering. The method includes: acquiring an input image and an input question; constructing a visual graph based on the input image, wherein the visual graph comprises a first node feature and a first edge feature; constructing a question graph based on the input question, wherein the question graph comprises a second node feature and a second edge feature; performing a multimodal fusion on the visual graph and the question graph to obtain an updated visual graph and an updated question graph; determining a question feature based on the input question; determining a fusion feature based on the updated visual graph, the updated question graph and the question feature; and generating a predicted answer for the input image and the input question. The present disclosure further provides an apparatus for visual question answering, a computer device and a medium.

Patent Agency Ranking