-
公开(公告)号:US20230124389A1
公开(公告)日:2023-04-20
申请号:US17887690
申请日:2022-08-15
Inventor: Longchao WANG , Yipeng SUN , Kun YAO , Junyu HAN , Jingtuo LIU , Errui DING
IPC: G06V10/70 , G06V10/774
Abstract: A model determination method and electronic device is provided, and relates to the technical field of artificial intelligence and, in particular, to the field of computer visions and deep learning, and can be applied to image processing, image identification and other scenarios. A specific implementation solution includes an image sample and a text sample are acquired, wherein text data in the text sample is used for performing text description to target image data in the image sample; at least one image feature in the image sample is stored to a first queue, and at least text feature in the text sample is stored to a second queue; the first queue and the second queue are trained to obtain a first target model; and the first target model is determined as an initialization model for a second target model.
-
公开(公告)号:US20220392205A1
公开(公告)日:2022-12-08
申请号:US17892669
申请日:2022-08-22
Inventor: Yipeng SUN , Rongqiao AN , Xiang WEI , Longchao WANG , Kun YAO , Junyu HAN , Jingtuo LIU , Errui DING
Abstract: Embodiments of the present disclosure provide a method and apparatus for training an image recognition model based on a semantic enhancement, a method and apparatus for recognizing an image, an electronic device, and a computer readable storage medium. The method for training an image recognition model based on a semantic enhancement comprises: extracting, from an inputted first image being unannotated and having no textual description, a first feature representation of the first image; calculating a first loss function based on the first feature representation; extracting, from an inputted second image being unannotated and having an original textual description, a second feature representation of the second image; calculating a second loss function based on the second feature representation, and training an image recognition model based on a fusion of the first loss function and the second loss function.
-
公开(公告)号:US20230386168A1
公开(公告)日:2023-11-30
申请号:US18192393
申请日:2023-03-29
Inventor: Yipeng SUN , Mengjun CHENG , Longchao WANG , Xiongwei ZHU , Kun YAO , Junyu HAN , Jingtuo LIU , Errui DING , Jingdong WANG , Haifeng Wang
IPC: G06V10/42 , G06F16/583 , H04N19/176
CPC classification number: G06V10/42 , G06F16/5846 , H04N19/176
Abstract: A pre-training method for a Vision and Scene Text Aggregation model includes: acquiring a sample image-text pair; extracting a sample scene text from a sample image; inputting a sample text into a text encoding network to obtain a sample text feature; inputting the sample image and an initial sample aggregation feature into a visual encoding subnetwork and inputting the initial sample aggregation feature and the sample scene text into a scene encoding subnetwork to obtain a global image feature of the sample image and a learned sample aggregation feature; and pre-training the Vision and Scene Text Aggregation model according to the sample text feature, the global image feature of the sample image, and the learned sample aggregation feature.
-
-