Patent search ap:("NVIDIA Corporation") AND inv:"Rafid Reza Mahmood" Page 1

1.

发明公开
ESTIMATING OPTIMAL TRAINING DATA SET SIZE FOR MACHINE LEARNING MODEL SYSTEMS AND APPLICATIONS 审中-公开

公开(公告)号：US20230385687A1

公开(公告)日：2023-11-30

申请号：US17828663

申请日：2022-05-31

Applicant: NVIDIA Corporation

Inventor： Rafid Reza Mahmood , James Robert Lucas , David Jesus Acuna Marrero , Daiqing Li , Jonah Philion , Jose Manuel Alvarez Lopez , Zhiding Yu , Sanja Fidler , Marc Law

IPC: G06N20/00 , G06K9/62

CPC classification number: G06N20/00 , G06K9/6265

Abstract: Approaches for training data set size estimation for machine learning model systems and applications are described. Examples include a machine learning model training system that estimates target data requirements for training a machine learning model, given an approximate relationship between training data set size and model performance using one or more validation score estimation functions. To derive a validation score estimation function, a regression data set is generated from training data, and subsets of the regression data set are used to train the machine learning model. A validation score is computed for the subsets and used to compute regression function parameters to curve fit the selected regression function to the training data set. The validation score estimation function is then solved for and provides an output of an estimate of the number additional training samples needed for the validation score estimation function to meet or exceed a target validation score.

2.

发明公开
ESTIMATING OPTIMAL TRAINING DATA SET SIZES FOR MACHINE LEARNING MODEL SYSTEMS AND APPLICATIONS 审中-公开

公开(公告)号：US20230376849A1

公开(公告)日：2023-11-23

申请号：US18318212

申请日：2023-05-16

Applicant: NVIDIA Corporation

Inventor： Rafid Reza Mahmood , Marc Law , James Robert Lucas , Zhiding Yu , Jose Manuel Alvarez Lopez , Sanja Fidler

IPC: G06N20/00

CPC classification number: G06N20/00

Abstract: In various examples, estimating optimal training data set sizes for machine learning model systems and applications. Systems and methods are disclosed that estimate an amount of data to include in a training data set, where the training data set is then used to train one or more machine learning models to reach a target validation performance. To estimate the amount of training data, subsets of an initial training data set may be used to train the machine learning model(s) in order to determine estimates for the minimum amount of training data needed to train the machine learning model(s) to reach the target validation performance. The estimates may then be used to generate one or more functions, such as a cumulative density function and/or a probability density function, wherein the function(s) is then used to estimate the amount of training data needed to train the machine learning model(s).

3.

发明公开
OPTIMIZED ACTIVE LEARNING USING INTEGER PROGRAMMING 审中-公开

公开(公告)号：US20230244985A1

公开(公告)日：2023-08-03

申请号：US17591039

申请日：2022-02-02

Applicant: NVIDIA Corporation

Inventor： Rafid Reza Mahmood , Sanja Fidler , Marc Law

IPC: G06N20/00

CPC classification number: G06N20/00

Abstract: In various examples, a representative subset of data points are queried or selected using integer programming to minimize the Wasserstein distance between the selected data points and the data set from which they were selected. A Generalized Benders Decomposition (GBD) may be used to decompose and iteratively solve the minimization problem, providing a globally optimal solution (an identified subset of data points that match the distribution of their data set) within a threshold tolerance. Data selection may be accelerated by applying one or more constraints while iterating, such as optimality cuts that leverage properties of the Wasserstein distance and/or pruning constraints that reduce the search space of candidate data points. In an active learning implementation, a representative subset of unlabeled data points may be selected using GBD, labeled, and used to train machine learning model(s) over one or more cycles of active learning.

Search Results

Country/Region

Patent validity

Application date

Publication (announcement) day

applicant

The country/region where the applicant is located

Inventor

IPC

IPC Department

IPC class

IPC subclass

IPC group

IPC team

Appearance classification