Abstract:
A method of reducing computational complexity for a fixed point neural network operating in a system having a limited bit width in a multiplier-accumulator (MAC) includes reducing a number of bit shift operations when computing activations in the fixed point neural network. The method also includes balancing an amount of quantization error and an overflow error when computing activations in the fixed point neural network.
Abstract:
A method of address translation of images and filters to virtual matrices to perform a convolution by matrix multiplication includes receiving an image and a filter. Each image and filter has a memory address. The method also includes mapping the memory addresses to virtual matrix addresses based on a calculated linearized image and a calculated linearized filter. The method further includes converting data in the virtual matrix to a predefined internal format. The method still further includes convolving the image by matrix multiplication of the data in the predefined internal format based on the virtual matrix addresses.