Invention Application
- Patent Title: PRE-TRAINING OF COMPUTER VISION FOUNDATIONAL MODELS
-
Application No.: PCT/US2022/043568Application Date: 2022-09-15
-
Publication No.: WO2023091227A1Publication Date: 2023-05-25
- Inventor: YUAN, Lu , LI, Chunyuan , YANG, Jianwei , XIAO, Bin
- Applicant: MICROSOFT TECHNOLOGY LICENSING, LLC.
- Applicant Address: One Microsoft Way
- Assignee: MICROSOFT TECHNOLOGY LICENSING, LLC.
- Current Assignee: MICROSOFT TECHNOLOGY LICENSING, LLC.
- Current Assignee Address: One Microsoft Way
- Agency: CHATTERJEE, Aaron C. et al.
- Priority: US17/821,596 2022-08-23
- Main IPC: G06V20/70
- IPC: G06V20/70 ; G06K9/62 ; G06V10/82
Abstract:
Examples are provided for pre-training a computer vision foundation model. A representative method comprises curating a pre-training database of image-text pairs from weakly labeled data. Language is encoded of text descriptions from the image-text pairs. The images of the image-text pairs are encoded using a hierarchical vision transformer with shifted windows and convolutional embedding. Based on the encoded images and the encoded language, the computer vision foundation model is pre-trained via unified image-text contrastive learning.
Information query