-
1.
公开(公告)号:US20220327809A1
公开(公告)日:2022-10-13
申请号:US17809133
申请日:2022-06-27
Inventor: Wei Li , Can Gao , Guocheng Niu , Xinyan Xiao , Hao Liu , Jiachen Liu , Hua Wu , Haifeng Wang
IPC: G06V10/778 , G06V10/774 , G06V10/26 , G06F40/284
Abstract: A method for training a model based on multi-modal data joint learning, includes: obtaining multi-modal data; in which the multi-modal data include at least one type of single-modal data and at least one type of Pair multi-modal data; inputting the single-modal data and the Pair multi-modal data into a decoupling attention Transformer network model to generate respectively Token semantic representation features and cross-modal semantic representation features; and training the decoupling attention Transformer network model based on the Token semantic representation features and the cross-modal semantic representation features.