PRE-TRAINING OF COMPUTER VISION FOUNDATIONAL MODELS

Invention Application

WO2023091227A1 PRE-TRAINING OF COMPUTER VISION FOUNDATIONAL MODELS 审中-公开

Please log in to see more content

Patent Title: PRE-TRAINING OF COMPUTER VISION FOUNDATIONAL MODELS
Application No.: PCT/US2022/043568

Application Date: 2022-09-15
Publication No.: WO2023091227A1

Publication Date: 2023-05-25
Inventor: YUAN, Lu , LI, Chunyuan , YANG, Jianwei , XIAO, Bin
Applicant: MICROSOFT TECHNOLOGY LICENSING, LLC.
Applicant Address: One Microsoft Way
Assignee: MICROSOFT TECHNOLOGY LICENSING, LLC.
Current Assignee: MICROSOFT TECHNOLOGY LICENSING, LLC.
Current Assignee Address: One Microsoft Way
Agency: CHATTERJEE, Aaron C. et al.
Priority: US17/821,596 2022-08-23
Main IPC: G06V20/70
IPC: G06V20/70 ; G06K9/62 ; G06V10/82

PRE-TRAINING OF COMPUTER VISION FOUNDATIONAL MODELS

Abstract:

Examples are provided for pre-training a computer vision foundation model. A representative method comprises curating a pre-training database of image-text pairs from weakly labeled data. Language is encoded of text descriptions from the image-text pairs. The images of the image-text pairs are encoded using a hierarchical vision transformer with shifted windows and convolutional embedding. Based on the encoded images and the encoded language, the computer vision foundation model is pre-trained via unified image-text contrastive learning.

Information query

Global Dossier Patent Scope Espacenet

IPC分类:

G	物理
G06	计算；推算或计数
G06V	图像或视频识别或理解
G06V20/00	场景；特定场景元素（控制数码相机 H04N5/232）
G06V20/70	.标记场景内容，例如派生句法或语义表示