Invention Publication
- Patent Title: DATA AUGMENTATION AND BATCH BALANCING FOR TRAINING MULTI-LINGUAL MODEL
-
Application No.: US18485779Application Date: 2023-10-12
-
Publication No.: US20240135116A1Publication Date: 2024-04-25
- Inventor: Duy Vu , Poorya Zaremoodi , Nagaraj N. Bhat , Srijon Sarkar , Varsha Kuppur Rajendra , Thanh Long Duong , Mark Edward Johnson , Pramir Sarkar , Shahid Reza
- Applicant: Oracle International Corporation
- Applicant Address: US CA Redwood Shores
- Assignee: Oracle International Corporation
- Current Assignee: Oracle International Corporation
- Current Assignee Address: US CA Redwood Shores
- Priority: IN 2241058581 2022.10.13
- Main IPC: G06F40/58
- IPC: G06F40/58 ; G06F40/20

Abstract:
A computer-implemented method includes: accessing a plurality of datasets, where each dataset of the plurality of datasets includes training examples; selecting datasets that include the training examples in a source language and a target language; and sampling, based on a sampling weight that is determined for each of the selected datasets, the training examples from the selected datasets to generate the training batches; training an ML model for performing at least a first task using the training examples of the training batches, by interleavingly inputting the training batches to the ML model; and outputting the trained ML model configured to perform the at least the first task on input utterances provided in at least one among the source language and the target language. The sampling weight is determined for each of the selected datasets based on one or more attributes common to the training examples of the selected dataset.
Information query