Patent search ap:("NVIDIA Corporation") AND inv:"Hao Wu" Page 1

1.

发明授权
Tensor processing using low precision format 有权

公开(公告)号：US12299577B2

公开(公告)日：2025-05-13

申请号：US15624577

申请日：2017-06-15

Applicant: NVIDIA CORPORATION

Inventor： Boris Ginsburg , Sergei Nikolaev , Ahmad Kiswani , Hao Wu , Amir Gholaminejad , Slawomir Kierat , Michael Houston , Alex Fit-Florea

IPC: G06N3/084 , G06F17/16 , G06N3/04 , G06N3/045 , G06N3/088

Abstract: Aspects of the present invention are directed to computer-implemented techniques for improving the training of artificial neural networks using a reduced precision (e.g., float16) data format. Embodiments of the present invention rescale tensor values prior to performing matrix operations (such as matrix multiplication or matrix addition) to prevent overflow and underflow. To preserve accuracy throughout the performance of the matrix operations, the scale factors are defined using a novel data format to represent tensors, wherein a matrix is represented by the tuple X, where X=(a, v[.]), wherein a is a float scale factor and v[.] are scaled values stored in the float16 format. The value of any element X[i] according to this data format would be equal to a*v[i].

2.

发明申请
STOCHASTIC ROUNDING OF NUMERICAL VALUES 审中-公开

公开(公告)号：US20190377549A1

公开(公告)日：2019-12-12

申请号：US16001838

申请日：2018-06-06

Applicant: NVIDIA Corporation

Inventor： Jonah M. Alben , Paulius Micikevicius , Hao Wu , Ming Yiu Siu

IPC: G06F7/499

Abstract: A method, computer readable medium, and system are disclosed for rounding numerical values. A set of bits from an input value is identified as a rounding value. A second set of bits representing a second value is extracted from the input value and added with the rounding value to produce a sum. The sum is truncated to produce the rounded output value. Thus, the present invention provides a stochastic rounding technique that rounds up an input value as a function of a second value and a rounding value, both of which were obtained from the input value. When the second value and rounding value are obtained from consistent bit locations of the input value, the resulting output value is deterministic. Stochastic rounding, which is deterministic, is advantageously applicable in deep learning applications.

3.

发明申请
AUTOMATED METHODS FOR CONVERSIONS TO A LOWER PRECISION DATA FORMAT 审中-公开

公开(公告)号：US20180211152A1

公开(公告)日：2018-07-26

申请号：US15838273

申请日：2017-12-11

Applicant: NVIDIA Corporation

Inventor： Szymon Migacz , Hao Wu , Dilip Sequeira , Ujval Kapasi , Maxim Milakov , Slawomir Kierat , Zacky Zhou , Yilin Zhang , Alex Fit-Florea

IPC: G06N3/04 , G06N3/08

CPC classification number: G06N3/04 , G06N3/0454 , G06N3/08 , G06N7/00

Abstract: Aspects of the present invention are directed to computer-implemented techniques for performing data compression and conversion between data formats of varying degrees of precision, and more particularly for improving the inferencing (application) of artificial neural networks using a reduced precision (e.g., INT8) data format. Embodiments of the present invention generate candidate conversions of data output, then employ a relative measure of quality to identify the candidate conversion with the greatest accuracy (i.e., least divergence from the original higher precision values). The representation can be then be used during inference to perform computations on the resulting output data.

4.

发明申请
TRANSPOSED SPARSE MATRIX MULTIPLY BY DENSE MATRIX FOR NEURAL NETWORK TRAINING 有权

公开(公告)号：US20250148286A1

公开(公告)日：2025-05-08

申请号：US18740361

申请日：2024-06-11

Applicant: NVIDIA Corporation

Inventor： Hao Wu

IPC: G06N3/084

Abstract: Machine learning systems that implement neural networks typically operate in an inference mode or a training mode. In the training mode, inference operations are performed to help guide the training process. Inference mode operation typically involves forward propagation and intensive access to certain sparse matrices, encoded as a set of vectors. Back propagation and intensive access to transposed versions of the same sparse matrices provide training refinements. Generating a transposed version of a sparse matrix can consume significant additional memory and computation resources. In one embodiment, two additional encoding vectors are generated, providing efficient operations on sparse matrices and also on transposed representations of the same sparse matrices. In a neural network the efficient operations can reduce the amount of memory needed for backpropagation and reduce power consumption.

5.

发明公开
LOSS-SCALING FOR DEEP NEURAL NETWORK TRAINING WITH REDUCED PRECISION 审中-公开

公开(公告)号：US20240078433A1

公开(公告)日：2024-03-07

申请号：US18385871

申请日：2023-10-31

Applicant: NVIDIA Corporation

Inventor： Jonah Alben , Paulius Micikevicius , Hao Wu

IPC: G06N3/084 , G06N3/04 , G06N3/063

CPC classification number: G06N3/084 , G06N3/04 , G06N3/063

Abstract: In training a deep neural network using reduced precision, gradient computation operates on larger values without affecting the rest of the training procedure. One technique trains the deep neural network to develop loss, scales the loss, computes gradients at a reduced precision, and reduces the magnitude of the computed gradients to compensate for scaling of the loss. In one example non-limiting arrangement, the training forward pass scales a loss value by some factor S and the weight update reduces the weight gradient contribution by 1/S. Several techniques can be used for selecting scaling factor S and adjusting the weight update.

6.

发明授权
Loss-scaling for deep neural network training with reduced precision 有权

公开(公告)号：US11842280B2

公开(公告)日：2023-12-12

申请号：US15971884

申请日：2018-05-04

Applicant: NVIDIA Corporation

Inventor： Jonah Alben , Paulius Micikevicius , Hao Wu

IPC: G06N3/084 , G06N3/04 , G06N3/063

CPC classification number: G06N3/084 , G06N3/04 , G06N3/063

Abstract: In training a deep neural network using reduced precision, gradient computation operates on larger values without affecting the rest of the training procedure. One technique trains the deep neural network to develop loss, scales the loss, computes gradients at a reduced precision, and reduces the magnitude of the computed gradients to compensate for scaling of the loss. In one example non-limiting arrangement, the training forward pass scales a loss value by some factor S and the weight update reduces the weight gradient contribution by 1/S. Several techniques can be used for selecting scaling factor S and adjusting the weight update.

7.

发明申请
TRAINING A NEURAL NETWORK USING SELECTIVE WEIGHT UPDATES 审中-公开

公开(公告)号：US20200380369A1

公开(公告)日：2020-12-03

申请号：US16428760

申请日：2019-05-31

Applicant: NVIDIA Corporation

Inventor： Carl Case , Hao Wu

IPC: G06N3/08 , G06N3/06 , G06K9/62

Abstract: Training one or more neural networks using selective updates to weight information of the one or more neural networks. In at least one embodiment, one or more neural networks are trained by at least updating one or more portions of weight information of the one or more neural networks based, at least in part, on metadata that indicate how recently the one or more portions of weight information has been updated.

8.

发明授权
Transposed sparse matrix multiply by dense matrix for neural network training 有权

公开(公告)号：US12008475B2

公开(公告)日：2024-06-11

申请号：US16191201

申请日：2018-11-14

Applicant: NVIDIA Corporation

Inventor： Hao Wu

IPC: G06N3/084

CPC classification number: G06N3/084

Abstract: Machine learning systems that implement neural networks typically operate in an inference mode or a training mode. In the training mode, inference operations are performed to help guide the training process. Inference mode operation typically involves forward propagation and intensive access to certain sparse matrices, encoded as a set of vectors. Back propagation and intensive access to transposed versions of the same sparse matrices provide training refinements. Generating a transposed version of a sparse matrix can consume significant additional memory and computation resources. In one embodiment, two additional encoding vectors are generated, providing efficient operations on sparse matrices and also on transposed representations of the same sparse matrices. In a neural network the efficient operations can reduce the amount of memory needed for backpropagation and reduce power consumption.

9.

发明授权
Automated methods for conversions to a lower precision data format 有权

公开(公告)号：US10997492B2

公开(公告)日：2021-05-04

申请号：US15838273

申请日：2017-12-11

Applicant: NVIDIA Corporation

Inventor： Szymon Migacz , Hao Wu , Dilip Sequeira , Ujval Kapasi , Maxim Milakov , Slawomir Kierat , Zacky Zhou , Yilin Zhang , Alex Fit-Florea

IPC: G06N3/04 , G06N3/08 , G06N3/063 , G06N3/02 , G06N3/10 , G06N7/00 , G06T9/00

Abstract: Aspects of the present invention are directed to computer-implemented techniques for performing data compression and conversion between data formats of varying degrees of precision, and more particularly for improving the inferencing (application) of artificial neural networks using a reduced precision (e.g., INT8) data format. Embodiments of the present invention generate candidate conversions of data output, then employ a relative measure of quality to identify the candidate conversion with the greatest accuracy (i.e., least divergence from the original higher precision values). The representation can be then be used during inference to perform computations on the resulting output data.

10.

发明授权
Stochastic rounding of numerical values 有权

公开(公告)号：US10684824B2

公开(公告)日：2020-06-16

申请号：US16001838

申请日：2018-06-06

Applicant: NVIDIA Corporation

Inventor： Jonah M. Alben , Paulius Micikevicius , Hao Wu , Ming Yiu Siu

IPC: G06F7/499

Abstract: A method, computer readable medium, and system are disclosed for rounding numerical values. A set of bits from an input value is identified as a rounding value. A second set of bits representing a second value is extracted from the input value and added with the rounding value to produce a sum. The sum is truncated to produce the rounded output value. Thus, the present invention provides a stochastic rounding technique that rounds up an input value as a function of a second value and a rounding value, both of which were obtained from the input value. When the second value and rounding value are obtained from consistent bit locations of the input value, the resulting output value is deterministic. Stochastic rounding, which is deterministic, is advantageously applicable in deep learning applications.

Search Results

Country/Region

Patent validity

Application date

Publication (announcement) day

applicant

The country/region where the applicant is located

Inventor

IPC

IPC Department

IPC class

IPC subclass

IPC group

IPC team

Appearance classification