-
公开(公告)号:US20250125819A1
公开(公告)日:2025-04-17
申请号:US18985581
申请日:2024-12-18
Applicant: NVIDIA Corporation
Abstract: Vector-scaled hierarchical codebook quantization reduces precision (bitwidth) vectors of parameters and may enable energy-efficient acceleration of deep neural networks. A vector (block array) comprises one or more parameters within a single dimension of a multi-dimensional tensor (or kernel). For example, block array comprises 4 sub-vectors (blocks) and each sub-vector comprises 8 parameters. The parameters may be represented in integer, floating-point, or any other suitable format. A vector cluster quantization technique is used to quantize blocks of parameters in real-time. Hardware circuitry within a datapath identifies an optimal codebook of a plurality of codebooks for quantizing each block of parameters and the block is encoded using the identified codebook. During processing, the identified codebook is used to obtain the quantized parameter and perform computations at the reduced precision.
-
公开(公告)号:US11886980B2
公开(公告)日:2024-01-30
申请号:US16549683
申请日:2019-08-23
Applicant: NVIDIA Corporation
Inventor: William James Dally , Rangharajan Venkatesan , Brucek Kurdo Khailany
CPC classification number: G06N3/063 , G06F7/4833 , G06F17/16
Abstract: Neural networks, in many cases, include convolution layers that are configured to perform many convolution operations that require multiplication and addition operations. Compared with performing multiplication on integer, fixed-point, or floating-point format values, performing multiplication on logarithmic format values is straightforward and energy efficient as the exponents are simply added. However, performing addition on logarithmic format values is more complex. Conventionally, addition is performed by converting the logarithmic format values to integers, computing the sum, and then converting the sum back into the logarithmic format. Instead, logarithmic format values may be added by decomposing the exponents into separate quotient and remainder components, sorting the quotient components based on the remainder components, summing the sorted quotient components to produce partial sums, and multiplying the partial sums by the remainder components to produce a sum. The sum may then be converted back into the logarithmic format.
-
公开(公告)号:US20220067512A1
公开(公告)日:2022-03-03
申请号:US17086114
申请日:2020-10-30
Applicant: NVIDIA Corporation
Inventor: Brucek Kurdo Khailany , Steve Haihang Dai , Rangharajan Venkatesan , Haoxing Ren
Abstract: Today neural networks are used to enable autonomous vehicles and improve the quality of speech recognition, real-time language translation, and online search optimizations. However, operation of the neural networks for these applications consumes energy. Quantization of parameters used by the neural networks reduces the amount of memory needed to store the parameters while also reducing the power consumed during operation of the neural network. Matrix operations performed by the neural networks require many multiplication calculations, so reducing the number of bits that are multiplied reduces the energy that is consumed. Quantizing smaller sets of the parameters using a shared scale factor improves accuracy compared with quantizing larger sets of the parameters. Accuracy of the calculations may be maintained by quantizing and scaling the parameters using fine-grained per-vector scale factors. A vector includes one or more elements within a single dimension of a multi-dimensional matrix.
-
公开(公告)号:US20210056446A1
公开(公告)日:2021-02-25
申请号:US16750823
申请日:2020-01-23
Applicant: NVIDIA Corporation
Inventor: William James Dally , Rangharajan Venkatesan , Brucek Kurdo Khailany
IPC: G06N5/04
Abstract: Neural networks, in many cases, include convolution layers that are configured to perform many convolution operations that require multiplication and addition operations. Compared with performing multiplication on integer, fixed-point, or floating-point format values, performing multiplication on logarithmic format values is straightforward and energy efficient as the exponents are simply added. However, performing addition on logarithmic format values is more complex. Conventionally, addition is performed by converting the logarithmic format values to integers, computing the sum, and then converting the sum back into the logarithmic format. Instead, logarithmic format values may be added by decomposing the exponents into separate quotient and remainder components, sorting the quotient components based on the remainder components, summing the sorted quotient components using an asynchronous accumulator to produce partial sums, and multiplying the partial sums by the remainder components to produce a sum. The sum may then be converted back into the logarithmic format.
-
-
-