Patent search ap:("NVIDIA Corporation") AND inv:"Rangharajan Venkatesan" Page 2

11.

发明申请
ENERGY-EFFICIENT DATAPATH FOR VECTOR-SCALED HIERARCHICAL CODEBOOK QUANTIZATION 有权

公开(公告)号：US20250125819A1

公开(公告)日：2025-04-17

申请号：US18985581

申请日：2024-12-18

Applicant: NVIDIA Corporation

Inventor： Rangharajan Venkatesan , Reena Elangovan , Brucek Kurdo Khailany , Brian Matthew Zimmer

IPC: H03M13/09 , H03M13/00

Abstract: Vector-scaled hierarchical codebook quantization reduces precision (bitwidth) vectors of parameters and may enable energy-efficient acceleration of deep neural networks. A vector (block array) comprises one or more parameters within a single dimension of a multi-dimensional tensor (or kernel). For example, block array comprises 4 sub-vectors (blocks) and each sub-vector comprises 8 parameters. The parameters may be represented in integer, floating-point, or any other suitable format. A vector cluster quantization technique is used to quantize blocks of parameters in real-time. Hardware circuitry within a datapath identifies an optimal codebook of a plurality of codebooks for quantizing each block of parameters and the block is encoded using the identified codebook. During processing, the identified codebook is used to obtain the quantized parameter and perform computations at the reduced precision.

12.

发明授权
Neural network accelerator using logarithmic-based arithmetic 有权

公开(公告)号：US11886980B2

公开(公告)日：2024-01-30

申请号：US16549683

申请日：2019-08-23

Applicant: NVIDIA Corporation

Inventor： William James Dally , Rangharajan Venkatesan , Brucek Kurdo Khailany

IPC: G06N3/06 , G06N3/063 , G06F17/16 , G06F7/483

CPC classification number: G06N3/063 , G06F7/4833 , G06F17/16

Abstract: Neural networks, in many cases, include convolution layers that are configured to perform many convolution operations that require multiplication and addition operations. Compared with performing multiplication on integer, fixed-point, or floating-point format values, performing multiplication on logarithmic format values is straightforward and energy efficient as the exponents are simply added. However, performing addition on logarithmic format values is more complex. Conventionally, addition is performed by converting the logarithmic format values to integers, computing the sum, and then converting the sum back into the logarithmic format. Instead, logarithmic format values may be added by decomposing the exponents into separate quotient and remainder components, sorting the quotient components based on the remainder components, summing the sorted quotient components to produce partial sums, and multiplying the partial sums by the remainder components to produce a sum. The sum may then be converted back into the logarithmic format.

13.

发明申请
FINE-GRAINED PER-VECTOR SCALING FOR NEURAL NETWORK QUANTIZATION 有权

公开(公告)号：US20220067512A1

公开(公告)日：2022-03-03

申请号：US17086114

申请日：2020-10-30

Applicant: NVIDIA Corporation

Inventor： Brucek Kurdo Khailany , Steve Haihang Dai , Rangharajan Venkatesan , Haoxing Ren

IPC: G06N3/08 , G06N3/04 , G06F17/16

Abstract: Today neural networks are used to enable autonomous vehicles and improve the quality of speech recognition, real-time language translation, and online search optimizations. However, operation of the neural networks for these applications consumes energy. Quantization of parameters used by the neural networks reduces the amount of memory needed to store the parameters while also reducing the power consumed during operation of the neural network. Matrix operations performed by the neural networks require many multiplication calculations, so reducing the number of bits that are multiplied reduces the energy that is consumed. Quantizing smaller sets of the parameters using a shared scale factor improves accuracy compared with quantizing larger sets of the parameters. Accuracy of the calculations may be maintained by quantizing and scaling the parameters using fine-grained per-vector scale factors. A vector includes one or more elements within a single dimension of a multi-dimensional matrix.

14.

发明申请
INFERENCE ACCELERATOR USING LOGARITHMIC-BASED ARITHMETIC 有权

公开(公告)号：US20210056446A1

公开(公告)日：2021-02-25

申请号：US16750823

申请日：2020-01-23

Applicant: NVIDIA Corporation

Inventor： William James Dally , Rangharajan Venkatesan , Brucek Kurdo Khailany

IPC: G06N5/04

Abstract: Neural networks, in many cases, include convolution layers that are configured to perform many convolution operations that require multiplication and addition operations. Compared with performing multiplication on integer, fixed-point, or floating-point format values, performing multiplication on logarithmic format values is straightforward and energy efficient as the exponents are simply added. However, performing addition on logarithmic format values is more complex. Conventionally, addition is performed by converting the logarithmic format values to integers, computing the sum, and then converting the sum back into the logarithmic format. Instead, logarithmic format values may be added by decomposing the exponents into separate quotient and remainder components, sorting the quotient components based on the remainder components, summing the sorted quotient components using an asynchronous accumulator to produce partial sums, and multiplying the partial sums by the remainder components to produce a sum. The sum may then be converted back into the logarithmic format.

Search Results

Country/Region

Patent validity

Application date

Publication (announcement) day

applicant

The country/region where the applicant is located

Inventor

IPC

IPC Department

IPC class

IPC subclass

IPC group

IPC team

Appearance classification