Method and apparatus for neural network quantization
Abstract:
Apparatuses and methods of manufacturing same, systems, and methods for performing network parameter quantization in deep neural networks are described. In one aspect, diagonals of a second-order partial derivative matrix (a Hessian matrix) of a loss function of network parameters of a neural network are determined and then used to weight (Hessian-weighting) the network parameters as part of quantizing the network parameters. In another aspect, the neural network is trained using first and second moment estimates of gradients of the network parameters and then the second moment estimates are used to weight the network parameters as part of quantizing the network parameters. In yet another aspect, network parameter quantization is performed by using an entropy-constrained scalar quantization (ECSQ) iterative algorithm. In yet another aspect, network parameter quantization is performed by quantizing the network parameters of all layers of a deep neural network together at once.
Public/Granted literature
Information query
Patent Agency Ranking
0/0