BIAS SCALING FOR N-BIT CONSTRAINED HARDWARE ACCELERATION
Abstract:
In described examples, an integrated circuit includes a memory storing weights and biases, an N-bit fixed point matrix operations accelerator, and a processor. Starting with a first convolution layer, a convolution layer modeled using the processor receives input feature values. A feature scale and weight scale are reduced if an accumulator scale is greater than a maximum bias scale. The input feature values are rescaled using the feature scale, the weights are quantized using the weight scale, and the biases are quantized using the feature scale and weight scale. The rescaled input feature values and quantized weights and biases are convolved using the N-bit fixed point matrix operations accelerator to generate output feature values. The process repeats from the receive action using the output feature values as the input feature values of the next convolution layer. The process then repeats for all layers, feeding back an output feature range.
Information query
Patent Agency Ranking
0/0