Quantized softmax layer for neural networks

    公开(公告)号:US11861452B1

    公开(公告)日:2024-01-02

    申请号:US16443634

    申请日:2019-06-17

    Inventor: Ming Kai Hsu

    CPC classification number: G06N3/063 G06F17/18 G06N3/047 G10L25/30

    Abstract: Quantized softmax layers in neural networks are described. Some embodiments involve receiving, at an input to a softmax layer of a neural network from an intermediate layer of the neural network, a non-normalized output comprising a plurality of intermediate network decision values. Then for each intermediate network decision value of the plurality of intermediate network decision values, the embodiment involves: calculating a difference between the intermediate network decision value and a maximum network decision value; requesting, from a lookup table, a corresponding lookup table value using the difference between the intermediate network decision value and the maximum network decision value; and selecting the corresponding lookup table value as a corresponding decision value. A normalized output is then generated comprising the corresponding lookup table value for said each intermediate network decision value of the plurality of intermediate network decision values.

    Constraint-based dynamic quantization adjustment for fixed-point processing

    公开(公告)号:US11630982B1

    公开(公告)日:2023-04-18

    申请号:US16131402

    申请日:2018-09-14

    Abstract: Aspects of the present disclosure address systems and methods for fixed-point quantization using a dynamic quantization level adjustment scheme. Consistent with some embodiments, a method comprises accessing a neural network comprising floating-point representations of filter weights corresponding to one or more convolution layers. The method further includes determining a peak value of interest from the filter weights and determining a quantization level for the filter weights based on a number of bits in a quantization scheme. The method further includes dynamically adjusting the quantization level based on one or more constraints. The method further includes determining a quantization scale of the filter weights based on the peak value of interest and the adjusted quantization level. The method further includes quantizing the floating-point representations of the filter weights using the quantization scale to generate fixed-point representations of the filter weights.

Patent Agency Ranking