IMAGE CLASSIFICATION METHOD AND APPARATUS, ELECTRONIC DEVICE AND STORAGE MEDIUM

发明公开

EP3923185A2 IMAGE CLASSIFICATION METHOD AND APPARATUS, ELECTRONIC DEVICE AND STORAGE MEDIUM 审中-公开

请登陆查看更多内容

专利标题： IMAGE CLASSIFICATION METHOD AND APPARATUS, ELECTRONIC DEVICE AND STORAGE MEDIUM
申请号： EP21202754.4

申请日： 2021-10-14
公开(公告)号： EP3923185A2

公开(公告)日： 2021-12-15
发明人: YU, Yuechen , ZHANG, Chengquan , LI, Yulin , ZHANG, Xiaoqiang , HUANG,, Ju , QIN, Xiameng , YAO, Kun , LIU, Jingtuo , HAN, Junyu , DING, Errui
申请人： Beijing Baidu Netcom Science and Technology Co., Ltd.
申请人地址： CN Beijing 100085 2/F Baidu Campus No.10 Shangdi 10th Street Haidian District
代理机构： Regimbeau
优先权： CN202110235776 20210303
主分类号： G06K9/00
IPC分类号： G06K9/00 ; G06K9/62

IMAGE CLASSIFICATION METHOD AND APPARATUS, ELECTRONIC DEVICE AND STORAGE MEDIUM

摘要：

Provided are an image classification method and apparatus, an electronic device and a storage medium, relating to the field of artificial intelligence and, in particular, to computer vision and deep learning. The method includes inputting (S101, S201, S301) a to-be-classified document image into a pretrained neural network and obtaining a feature submap of each text box of the to-be-classified document image by use of the neural network; inputting (S102, S202, S302) the feature submap of each text box, a semantic feature corresponding to preobtained text information of each text box and a position feature corresponding to preobtained position information of each text box into a pretrained multimodal feature fusion model and fusing, by use of the multimodal feature fusion model, the three into a multimodal feature corresponding to each text box; and classifying (S103) the to-be-classified document image based on the multimodal feature corresponding to each text box. The semantic feature and position feature in the document image are well used so that the object of improving the classification accuracy of the document image is achieved.

公开/授权文献

EP3923185A3 IMAGE CLASSIFICATION METHOD AND APPARATUS, ELECTRONIC DEVICE AND STORAGE MEDIUM 公开/授权日：2022-04-27

信息查询

Global Dossier Espacenet

IPC分类:

G	物理
G06	计算；推算或计数
G06K	图形数据读取（图像或视频识别或理解G06V）；数据的呈现；记录载体；处理记录载体
G06K9/00	识别模式的方法或装置（图形读取或将机械参数模式（例如力或存在）转换为电信号的方法或装置 G06K11/00）（图像或视频识别或理解 G06V）（语音识别 G10L15/00 )