Analytic and empirical correction of biased error introduced by approximation methods

    公开(公告)号:US11604987B2

    公开(公告)日:2023-03-14

    申请号:US16826472

    申请日:2020-03-23

    Abstract: Various embodiments include methods and neural network computing devices implementing the methods, for generating an approximation neural network. Various embodiments may include performing approximation operations on a weights tensor associated with a layer of a neural network to generate an approximation weights tensor, determining an expected output error of the layer in the neural network due to the approximation weights tensor, subtracting the expected output error from a bias parameter of the layer to determine an adjusted bias parameter and substituting the adjusted bias parameter for the bias parameter in the layer. Such operations may be performed for one or more layers in a neural network to produce an approximation version of the neural network for execution on a resource limited processor.

    Channel Gating For Conditional Computation
    3.
    发明申请

    公开(公告)号:US20200372361A1

    公开(公告)日:2020-11-26

    申请号:US16419509

    申请日:2019-05-22

    Abstract: A computing device may be equipped with a generalized framework for accomplishing conditional computation or gating in a neural network. The computing device may receive input in a neural network layer that includes two or more filters. The computing device may intelligently determine whether the two or more filters are relevant to the received input. The computing device may deactivate filters that are determined not to be relevant to the received input (or activate filters that are determined to be relevant to the received input), and apply the received input to active filters in the layer to generate an activation.

    Systems and methods of cross layer rescaling for improved quantization performance

    公开(公告)号:US12242956B2

    公开(公告)日:2025-03-04

    申请号:US16826524

    申请日:2020-03-23

    Abstract: Various embodiments include methods and neural network computing devices implementing the methods for performing quantization in neural networks. Various embodiments may include equalizing ranges of weight tensors or output channel weights within a first layer of the neural network by scaling each of the output channel weights of the first layer by a corresponding scaling factor, and scaling each of a second adjacent layer's corresponding input channel weights by applying an inverse of the corresponding scaling factor to the input channel weights. The corresponding scaling factor may be determined using a black-box optimizer on a quantization error metric or based on heuristics, equalization of dynamic ranges, equalization of range extrema (minima or maxima), differential learning using straight through estimator (STE) methods and a local or global loss, or using an error metric for the quantization error and a black-box optimizer that minimizes the error metric with respect to the scaling.

Patent Agency Ranking