Invention Grant
- Patent Title: Method and apparatus for quantization, adaptive block partitioning and codebook coding for neural network model compression
-
Application No.: US17099202Application Date: 2020-11-16
-
Publication No.: US11245903B2Publication Date: 2022-02-08
- Inventor: Wei Wang , Wei Jiang , Shan Liu
- Applicant: TENCENT AMERICA LLC
- Applicant Address: US CA Palo Alto
- Assignee: TENCENT AMERICA LLC
- Current Assignee: TENCENT AMERICA LLC
- Current Assignee Address: US CA Palo Alto
- Agency: Sughrue Mion, PLLC
- Main IPC: H04N19/13
- IPC: H04N19/13 ; H04N19/176 ; H04N19/124 ; H04N19/119 ; H04N19/192 ; H04N19/46 ; G06N3/08 ; H04N19/597 ; H04N19/96 ; H04N19/30 ; H04N19/147

Abstract:
A method of quantization, adaptive block partitioning and codebook coding for neural network model compression, is performed by at least one processor and includes determining a saturated maximum value of a multi-dimensional tensor in a layer of a neural network, and a bit depth corresponding to the saturated maximum value, and clipping weight coefficients in the multi-dimensional tensor to be within a range of the saturated maximum value. The method further includes quantizing the clipped weight coefficients, based on the bit depth, and transmitting, to a decoder, a layer header including the bit depth.
Public/Granted literature
Information query