OPTIMIZING LOW PRECISION INFERENCE MODELS FOR DEPLOYMENT OF DEEP NEURAL NETWORKS

Invention Application

US20230118802A1 OPTIMIZING LOW PRECISION INFERENCE MODELS FOR DEPLOYMENT OF DEEP NEURAL NETWORKS 有权

Please log in to see more content

Patent Title: OPTIMIZING LOW PRECISION INFERENCE MODELS FOR DEPLOYMENT OF DEEP NEURAL NETWORKS
Application No.: US17929023

Application Date: 2020-03-13
Publication No.: US20230118802A1

Publication Date: 2023-04-20
Inventor: Jiong Gong , Yong Wu , Haihao Shen , Xiao Dong Lin , Guoming Zhang , Feng Yuan
Applicant: Intel Corporation
Applicant Address: US CA Santa Clara
Assignee: Intel Corporation
Current Assignee: Intel Corporation
Current Assignee Address: US CA Santa Clara
International Application: PCT/CN2020/079161 WO 20200313
Main IPC: G06N3/0495
IPC: G06N3/0495 ; G06N3/08

OPTIMIZING LOW PRECISION INFERENCE MODELS FOR DEPLOYMENT OF DEEP NEURAL NETWORKS

Abstract:

Systems, apparatuses and methods may provide technology for optimizing an inference neural network model that performs asymmetric quantization by generating a quantized neural network, wherein model weights of the neural network are quantized as signed integer values, and wherein an input layer of the neural network is configured to quantize input values as unsigned integer values, generating a weights accumulation table based on the quantized model weights and a kernel size for the neural network, and generating an output restoration function for an output layer of the neural network based on the weights accumulation table and the kernel size. The technology may also perform per-input channel quantization. The technology may also perform mixed-precision auto-tuning.

Information query

Global Dossier Espacenet