-
1.
公开(公告)号:US20240119725A1
公开(公告)日:2024-04-11
申请号:US18531426
申请日:2023-12-06
Inventor: Yan ZENG , Xinsong ZHANG , Hang LI
IPC: G06V10/96 , G06F40/40 , G06V10/75 , G06V10/774
CPC classification number: G06V10/96 , G06F40/40 , G06V10/759 , G06V10/774
Abstract: Provided in the present application are a method and apparatus for training a visual language pre-training model, and a device and a medium. The method includes: acquiring pairing groups respectively corresponding to N images, wherein the pairing group of a first image includes: a first pairing group which is composed of the first image and description text of the first image, and a second pairing group which is composed of a local image of the first image and description text of the local image, N is an integer greater than 1, and the first image is any one of the N images; and training a visual language pre-training model according to the pairing groups respectively corresponding to the N images.