Patent search ap:("NVIDIA Corporation") AND inv:"Rangharajan Venkatesan" Page 1

1.

发明授权
Neural network accelerator using logarithmic-based arithmetic 有权

公开(公告)号：US12118454B2

公开(公告)日：2024-10-15

申请号：US18537570

申请日：2023-12-12

Applicant: NVIDIA Corporation

Inventor： William James Dally , Rangharajan Venkatesan , Brucek Kurdo Khailany

IPC: G06N3/063 , G06F7/483 , G06F17/16

CPC classification number: G06N3/063 , G06F7/4833 , G06F17/16

Abstract: Neural networks, in many cases, include convolution layers that are configured to perform many convolution operations that require multiplication and addition operations. Compared with performing multiplication on integer, fixed-point, or floating-point format values, performing multiplication on logarithmic format values is straightforward and energy efficient as the exponents are simply added. However, performing addition on logarithmic format values is more complex. Conventionally, addition is performed by converting the logarithmic format values to integers, computing the sum, and then converting the sum back into the logarithmic format. Instead, logarithmic format values may be added by decomposing the exponents into separate quotient and remainder components, sorting the quotient components based on the remainder components, summing the sorted quotient components to produce partial sums, and multiplying the partial sums by the remainder components to produce a sum. The sum may then be converted back into the logarithmic format.

2.

发明申请
ASYNCHRONOUS ACCUMULATOR USING LOGARITHMIC-BASED ARITHMETIC 有权

公开(公告)号：US20210056399A1

公开(公告)日：2021-02-25

申请号：US16750917

申请日：2020-01-23

Applicant: NVIDIA Corporation

Inventor： William James Dally , Rangharajan Venkatesan , Brucek Kurdo Khailany , Stephen G. Tell

IPC: G06N3/063 , G06F17/16

Abstract: Neural networks, in many cases, include convolution layers that are configured to perform many convolution operations that require multiplication and addition operations. Compared with performing multiplication on integer, fixed-point, or floating-point format values, performing multiplication on logarithmic format values is straightforward and energy efficient as the exponents are simply added. However, performing addition on logarithmic format values is more complex. Conventionally, addition is performed by converting the logarithmic format values to integers, computing the sum, and then converting the sum back into the logarithmic format. Instead, logarithmic format values may be added by decomposing the exponents into separate quotient and remainder components, sorting the quotient components based on the remainder components, summing the sorted quotient components using an asynchronous accumulator to produce partial sums, and multiplying the partial sums by the remainder components to produce a sum. The sum may then be converted back into the logarithmic format.

3.

发明公开
ASYNCHRONOUS ACCUMULATOR USING LOGARITHMIC-BASED ARITHMETIC 审中-公开

公开(公告)号：US20240311626A1

公开(公告)日：2024-09-19

申请号：US18674632

申请日：2024-05-24

Applicant: NVIDIA Corporation

Inventor： William James Dally , Rangharajan Venkatesan , Brucek Kurdo Khailany , Stephen G. Tell

IPC: G06N3/063 , G06F7/544 , G06F7/575 , G06F17/16 , G06N3/02 , G06N3/045

CPC classification number: G06N3/063 , G06F7/575 , G06F17/16 , G06N3/02 , G06F7/5443 , G06N3/045

Abstract: Neural networks, in many cases, include convolution layers that are configured to perform many convolution operations that require multiplication and addition operations. Compared with performing multiplication on integer, fixed-point, or floating-point format values, performing multiplication on logarithmic format values is straightforward and energy efficient as the exponents are simply added. However, performing addition on logarithmic format values is more complex. Conventionally, addition is performed by converting the logarithmic format values to integers, computing the sum, and then converting the sum back into the logarithmic format. Instead, logarithmic format values may be added by decomposing the exponents into separate quotient and remainder components, sorting the quotient components based on the remainder components, summing the sorted quotient components using an asynchronous accumulator to produce partial sums, and multiplying the partial sums by the remainder components to produce a sum. The sum may then be converted back into the logarithmic format.

4.

发明公开
OPTIMALLY CLIPPED TENSORS AND VECTORS 审中-公开

公开(公告)号：US20230237308A1

公开(公告)日：2023-07-27

申请号：US17814957

申请日：2022-07-26

Applicant: NVIDIA Corporation

Inventor： Charbel Sakr , Steve Haihang Dai , Brucek Kurdo Khailany , William James Dally , Rangharajan Venkatesan , Brian Matthew Zimmer

IPC: G06N3/04 , G06N3/08

CPC classification number: G06N3/04 , G06N3/08

Abstract: Quantizing tensors and vectors processed within a neural network reduces power consumption and may accelerate processing. Quantization reduces the number of bits used to represent a value, where decreasing the number of bits used can decrease the accuracy of computations that use the value. Ideally, quantization is performed without reducing accuracy. Quantization-aware training (QAT) is performed by dynamically quantizing tensors (weights and activations) using optimal clipping scalars. “Optimal” in that the mean squared error (MSE) of the quantized operation is minimized and the clipping scalars define the degree or amount of quantization for various tensors of the operation. Conventional techniques that quantize tensors during training suffer from high amounts of noise (error). Other techniques compute the clipping scalars offline through a brute force search to provide high accuracy. In contrast, the optimal clipping scalars can be computed online and provide the same accuracy as the clipping scalars computed offline.

5.

发明申请
FINE-GRAINED PER-VECTOR SCALING FOR NEURAL NETWORK QUANTIZATION 有权

公开(公告)号：US20220067530A1

公开(公告)日：2022-03-03

申请号：US17086118

申请日：2020-10-30

Applicant: NVIDIA Corporation

Inventor： Brucek Kurdo Khailany , Steve Haihang Dai , Rangharajan Venkatesan , Haoxing Ren

IPC: G06N3/08 , G06N3/063 , G06F7/544 , G06F7/523 , G06F7/50

Abstract: Today neural networks are used to enable autonomous vehicles and improve the quality of speech recognition, real-time language translation, and online search optimizations. However, operation of the neural networks for these applications consumes energy. Quantization of parameters used by the neural networks reduces the amount of memory needed to store the parameters while also reducing the power consumed during operation of the neural network. Matrix operations performed by the neural networks require many multiplication calculations, so reducing the number of bits that are multiplied reduces the energy that is consumed. Quantizing smaller sets of the parameters using a shared scale factor improves accuracy compared with quantizing larger sets of the parameters. Accuracy of the calculations may be maintained by quantizing and scaling the parameters using fine-grained per-vector scale factors. A vector includes one or more elements within a single dimension of a multi-dimensional matrix.

6.

发明授权
Inference accelerator using logarithmic-based arithmetic 有权

公开(公告)号：US12141225B2

公开(公告)日：2024-11-12

申请号：US16750823

申请日：2020-01-23

Applicant: NVIDIA Corporation

Inventor： William James Dally , Rangharajan Venkatesan , Brucek Kurdo Khailany

IPC: G06F17/15 , G06F7/483 , G06N20/00

Abstract: Neural networks, in many cases, include convolution layers that are configured to perform many convolution operations that require multiplication and addition operations. Compared with performing multiplication on integer, fixed-point, or floating-point format values, performing multiplication on logarithmic format values is straightforward and energy efficient as the exponents are simply added. However, performing addition on logarithmic format values is more complex. Conventionally, addition is performed by converting the logarithmic format values to integers, computing the sum, and then converting the sum back into the logarithmic format. Instead, logarithmic format values may be added by decomposing the exponents into separate quotient and remainder components, sorting the quotient components based on the remainder components, summing the sorted quotient components using an asynchronous accumulator to produce partial sums, and multiplying the partial sums by the remainder components to produce a sum. The sum may then be converted back into the logarithmic format.

7.

发明授权
Fine-grained per-vector scaling for neural network quantization 有权

公开(公告)号：US12045307B2

公开(公告)日：2024-07-23

申请号：US17086118

申请日：2020-10-30

Applicant: NVIDIA Corporation

Inventor： Brucek Kurdo Khailany , Steve Haihang Dai , Rangharajan Venkatesan , Haoxing Ren

IPC: G06F17/16 , G06F5/01 , G06F7/544

CPC classification number: G06F17/16 , G06F5/01 , G06F7/5443

Abstract: Today neural networks are used to enable autonomous vehicles and improve the quality of speech recognition, real-time language translation, and online search optimizations. However, operation of the neural networks for these applications consumes energy. Quantization of parameters used by the neural networks reduces the amount of memory needed to store the parameters while also reducing the power consumed during operation of the neural network. Matrix operations performed by the neural networks require many multiplication calculations, so reducing the number of bits that are multiplied reduces the energy that is consumed. Quantizing smaller sets of the parameters using a shared scale factor improves accuracy compared with quantizing larger sets of the parameters. Accuracy of the calculations may be maintained by quantizing and scaling the parameters using fine-grained per-vector scale factors. A vector includes one or more elements within a single dimension of a multi-dimensional matrix.

8.

发明授权
Asynchronous accumulator using logarithmic-based arithmetic 有权

公开(公告)号：US12033060B2

公开(公告)日：2024-07-09

申请号：US16750917

申请日：2020-01-23

Applicant: NVIDIA Corporation

Inventor： William James Dally , Rangharajan Venkatesan , Brucek Kurdo Khailany , Stephen G. Tell

IPC: G06N3/063 , G06F7/544 , G06F7/575 , G06F17/16 , G06N3/02 , G06N3/045

CPC classification number: G06N3/063 , G06F7/575 , G06F17/16 , G06N3/02 , G06F7/5443 , G06N3/045

Abstract: Neural networks, in many cases, include convolution layers that are configured to perform many convolution operations that require multiplication and addition operations. Compared with performing multiplication on integer, fixed-point, or floating-point format values, performing multiplication on logarithmic format values is straightforward and energy efficient as the exponents are simply added. However, performing addition on logarithmic format values is more complex. Conventionally, addition is performed by converting the logarithmic format values to integers, computing the sum, and then converting the sum back into the logarithmic format. Instead, logarithmic format values may be added by decomposing the exponents into separate quotient and remainder components, sorting the quotient components based on the remainder components, summing the sorted quotient components using an asynchronous accumulator to produce partial sums, and multiplying the partial sums by the remainder components to produce a sum. The sum may then be converted back into the logarithmic format.

9.

发明公开
NEURAL NETWORK ACCELERATOR USING LOGARITHMIC-BASED ARITHMETIC 审中-公开

公开(公告)号：US20240112007A1

公开(公告)日：2024-04-04

申请号：US18537570

申请日：2023-12-12

Applicant: NVIDIA Corporation

Inventor： William James Dally , Rangharajan Venkatesan , Brucek Kurdo Khailany

IPC: G06N3/063 , G06F7/483 , G06F17/16

CPC classification number: G06N3/063 , G06F7/4833 , G06F17/16

Abstract: Neural networks, in many cases, include convolution layers that are configured to perform many convolution operations that require multiplication and addition operations. Compared with performing multiplication on integer, fixed-point, or floating-point format values, performing multiplication on logarithmic format values is straightforward and energy efficient as the exponents are simply added. However, performing addition on logarithmic format values is more complex. Conventionally, addition is performed by converting the logarithmic format values to integers, computing the sum, and then converting the sum back into the logarithmic format. Instead, logarithmic format values may be added by decomposing the exponents into separate quotient and remainder components, sorting the quotient components based on the remainder components, summing the sorted quotient components to produce partial sums, and multiplying the partial sums by the remainder components to produce a sum. The sum may then be converted back into the logarithmic format.

10.

发明申请
NEURAL NETWORK ACCELERATOR USING LOGARITHMIC-BASED ARITHMETIC 有权

公开(公告)号：US20210056397A1

公开(公告)日：2021-02-25

申请号：US16549683

申请日：2019-08-23

Applicant: NVIDIA Corporation

Inventor： William James Dally , Rangharajan Venkatesan , Brucek Kurdo Khailany

IPC: G06N3/063 , G06F17/16

Abstract: Neural networks, in many cases, include convolution layers that are configured to perform many convolution operations that require multiplication and addition operations. Compared with performing multiplication on integer, fixed-point, or floating-point format values, performing multiplication on logarithmic format values is straightforward and energy efficient as the exponents are simply added. However, performing addition on logarithmic format values is more complex. Conventionally, addition is performed by converting the logarithmic format values to integers, computing the sum, and then converting the sum back into the logarithmic format. Instead, logarithmic format values may be added by decomposing the exponents into separate quotient and remainder components, sorting the quotient components based on the remainder components, summing the sorted quotient components to produce partial sums, and multiplying the partial sums by the remainder components to produce a sum. The sum may then be converted back into the logarithmic format.

Search Results

Country/Region

Patent validity

Application date

Publication (announcement) day

applicant

The country/region where the applicant is located

Inventor

IPC

IPC Department

IPC class

IPC subclass

IPC group

IPC team

Appearance classification