Invention Application
- Patent Title: Multimodal Learning from Structured and Unstructured Data
-
Application No.: US18639519Application Date: 2024-04-18
-
Publication No.: US20240386321A1Publication Date: 2024-11-21
- Inventor: Sayna Ebrahimi , Yihe Dong , Tomas Pfister , Sercan Omer Arik
- Applicant: Google LLC
- Applicant Address: US CA Mountain View
- Assignee: Google LLC
- Current Assignee: Google LLC
- Current Assignee Address: US CA Mountain View
- Main IPC: G06N20/00
- IPC: G06N20/00

Abstract:
Aspects of the disclosure are directed to a multimodal processing system for processing both structured and un-structured data. Real-world data is not always consistent in form or content. The multimodal processing system includes model that can be trained to account for this characteristic of real-world data, by selectively masking data of different modalities during pretraining to learn outputs that are the same or comparable between the masked and un-masked inputs. The model is trained according to modality-specific masking objectives computed for each modality of data and joint modality similarity-based masking objectives for a joint representation of the data across all modalities. The system provides consistent and accurate input, even when input data may have substantial portions of data from different modalities missing. Cross-modal relationships in data are reinforced by the model as different portions of data are masked, contributing to an overall increase in model accuracy versus other approaches.
Information query