System and method for supervised contrastive learning for multi-modal tasks

Invention Grant

US12183062B2 System and method for supervised contrastive learning for multi-modal tasks 有权

Please log in to see more content

Patent Title: System and method for supervised contrastive learning for multi-modal tasks
Application No.: US17589535

Application Date: 2022-01-31
Publication No.: US12183062B2

Publication Date: 2024-12-31
Inventor: Changsheng Zhao , Burak Uzkent , Yilin Shen , Hongxia Jin
Applicant: Samsung Electronics Co., Ltd.
Applicant Address: KR Suwon-si
Assignee: Samsung Electronics Co., Ltd.
Current Assignee: Samsung Electronics Co., Ltd.
Current Assignee Address: KR Suwon-si
Main IPC: G06V10/80
IPC: G06V10/80 ; G06F40/279 ; G06V10/774 ; G06V10/778

System and method for supervised contrastive learning for multi-modal tasks

Abstract:

A method includes obtaining a batch of training data including multiple paired image-text pairs and multiple unpaired image-text pairs, where each paired image-text pair and each unpaired image-text pair includes an image and a text. The method also includes training a machine learning model using the training data based on an optimization of a combination of losses. The losses include, for each paired image-text pair, (i) a first multi-modal representation loss based on the paired image-text pair and (ii) a second multi-modal representation loss based on two or more unpaired image-text pairs, selected from among the multiple unpaired image-text pairs, wherein each of the two or more unpaired image-text pairs includes either the image or the text of the paired image-text pair.

Public/Granted literature

US20230245435A1 SYSTEM AND METHOD FOR SUPERVISED CONTRASTIVE LEARNING FOR MULTI-MODAL TASKS Public/Granted day:2023-08-03

Information query

Espacenet

IPC分类:

G	物理
G06	计算；推算或计数
G06V	图像或视频识别或理解
G06V10/00	图像或视频识别或理解的安排（图像或视频中的字符识别 G06V30/10）
G06V10/70	.使用模式识别或机器学习（光学模式识别或电子计算 G06V10/88）
G06V10/77	..处理特征空间中的图像或视频特征；使用数据集成或数据缩减，例如主成分分析 [PCA] 或独立成分分析 [ICA] 或自组织图 [SOM]；盲源分离
G06V10/80	...融合，即在传感器级别、预处理级别、特征提取级别或分类级别融合来自各种来源的数据（多模态讲话者的识别或验证 G10L17/10）