Patent search ap:("NVIDIA Corporation") AND inv:"Reena Elangovan" Page 1

1.

发明申请
ENERGY-EFFICIENT DATAPATH FOR VECTOR-SCALED HIERARCHICAL CODEBOOK QUANTIZATION 有权

公开(公告)号：US20250125819A1

公开(公告)日：2025-04-17

申请号：US18985581

申请日：2024-12-18

Applicant: NVIDIA Corporation

Inventor： Rangharajan Venkatesan , Reena Elangovan , Brucek Kurdo Khailany , Brian Matthew Zimmer

IPC: H03M13/09 , H03M13/00

Abstract: Vector-scaled hierarchical codebook quantization reduces precision (bitwidth) vectors of parameters and may enable energy-efficient acceleration of deep neural networks. A vector (block array) comprises one or more parameters within a single dimension of a multi-dimensional tensor (or kernel). For example, block array comprises 4 sub-vectors (blocks) and each sub-vector comprises 8 parameters. The parameters may be represented in integer, floating-point, or any other suitable format. A vector cluster quantization technique is used to quantize blocks of parameters in real-time. Hardware circuitry within a datapath identifies an optimal codebook of a plurality of codebooks for quantizing each block of parameters and the block is encoded using the identified codebook. During processing, the identified codebook is used to obtain the quantized parameter and perform computations at the reduced precision.

2.

发明公开
VECTOR CLUSTERED QUANTIZATION 审中-公开

公开(公告)号：US20240354570A1

公开(公告)日：2024-10-24

申请号：US18731069

申请日：2024-05-31

Applicant: NVIDIA Corporation

Inventor： Reena Elangovan , Charbel Sakr , Brucek Kurdo Khailany

IPC: G06N3/08 , G06N3/0495

CPC classification number: G06N3/08 , G06N3/0495

Abstract: Vector clustered quantization reduces precision (bitwidth) vectors of parameters and may enable energy-efficient acceleration of deep neural networks. A vector comprises one or more parameters within a single dimension of a multi-dimensional tensor (matrix or kernel). A set of quantizers is initialized for a first step (vector-clustering). After initialization, vectors are mapped into clusters based on quantization errors, where each one of the clusters is associated with a different one of the quantizers. During the second step (per-cluster quantization) each quantizer is optimized to quantize the vectors in the cluster that is associated with the quantizer. In an embodiment, the quantizers are optimized using the Lloyd-Max algorithm, which effectively minimizes the per-cluster quantization noise. The first and second steps may be repeated before the vectors are quantized for processing by a neural network model.

Patent Agency Ranking