NEURAL NETWORK LAYER OPTIMIZATION
    2.
    发明公开

    公开(公告)号:US20240062059A1

    公开(公告)日:2024-02-22

    申请号:US18191700

    申请日:2023-03-28

    CPC classification number: G06N3/08

    Abstract: Various examples disclosed herein relate to neural network quantization techniques, and more particularly, to selecting inference precisions for the layers of the neural network. In an example embodiment, a method is provided herein that includes determining an accuracy improvement of a layer of a neural network implemented using a first bit precision relative to using a second bit precision and determining a latency degradation of the layer of the neural network implemented using the first bit precision relative to using the second bit precision. The method further includes selecting, based on the accuracy improvement and the latency degradation, the first bit precision or the second bit precision for use in implementing the layer of the neural network.

    BIAS SCALING FOR N-BIT CONSTRAINED HARDWARE ACCELERATION

    公开(公告)号:US20220164411A1

    公开(公告)日:2022-05-26

    申请号:US17528472

    申请日:2021-11-17

    Abstract: In described examples, an integrated circuit includes a memory storing weights and biases, an N-bit fixed point matrix operations accelerator, and a processor. Starting with a first convolution layer, a convolution layer modeled using the processor receives input feature values. A feature scale and weight scale are reduced if an accumulator scale is greater than a maximum bias scale. The input feature values are rescaled using the feature scale, the weights are quantized using the weight scale, and the biases are quantized using the feature scale and weight scale. The rescaled input feature values and quantized weights and biases are convolved using the N-bit fixed point matrix operations accelerator to generate output feature values. The process repeats from the receive action using the output feature values as the input feature values of the next convolution layer. The process then repeats for all layers, feeding back an output feature range.

    Methods and systems for masking multimedia data

    公开(公告)号:US10200695B2

    公开(公告)日:2019-02-05

    申请号:US15063234

    申请日:2016-03-07

    Abstract: Several methods and systems for masking multimedia data are disclosed. In an embodiment, a method for masking includes performing a prediction for at least one multimedia data block based on a prediction mode of a plurality of prediction modes. The at least one multimedia data block is associated with a region of interest (ROI). A residual multimedia data associated with the at least one multimedia data block is generated based on the prediction. A quantization of the residual multimedia data is performed based on a quantization parameter (QP) value. The QP value is variable such that varying the QP value controls a degree of masking of the ROI.

    DYNAMIC QUANTIZATION FOR DEEP NEURAL NETWORK INFERENCE SYSTEM AND METHOD

    公开(公告)号:US20190012559A1

    公开(公告)日:2019-01-10

    申请号:US16028773

    申请日:2018-07-06

    Abstract: A method for dynamically quantizing feature maps of a received image. The method includes convolving an image based on a predicted maximum value, a predicted minimum value, trained kernel weights and the image data. The input data is quantized based on the predicted minimum value and predicted maximum value. The output of the convolution is computed into an accumulator and re-quantized. The re-quantized value is output to an external memory. The predicted min value and the predicted max value are computed based on the previous max values and min values with a weighted average or a pre-determined formula. Initial min value and max value are computed based on known quantization methods and utilized for initializing the predicted min value and predicted max value in the quantization process.

    QUANTIZATION FOR NEURAL NETWORKS
    6.
    发明申请

    公开(公告)号:US20250045572A1

    公开(公告)日:2025-02-06

    申请号:US18408351

    申请日:2024-01-09

    Abstract: Disclosed herein are systems and methods for performing post training quantization. A processor obtains fixed-point output values from a layer of an artificial neural network (ANN) wherein the layer includes fixed-point weights determined based on floating-point weights and a weight scaling factor determined based on an output scaling factor. Next, the processor converts the fixed-point output values to floating-point output values based on the output scaling factor. Then, the processor expands a range of floating-point values. Next, the processor calculates a new output scaling factor based on the expanded range of floating-point output values. Finally, the processor stores the new output scaling factor in an associated memory.

    SCHEDULING OF INFERENCE MODELS BASED ON PREEMPTABLE BOUNDARIES

    公开(公告)号:US20230252328A1

    公开(公告)日:2023-08-10

    申请号:US18153764

    申请日:2023-01-12

    CPC classification number: G06N5/048 G06F9/4818

    Abstract: Disclosed herein are systems and methods for inference model scheduling of a multi priority inference model system. A processor determines an interrupt flag has been set indicative of a request to interrupt execution of a first inference model in favor of a second inference model. In response to determining that the interrupt flag has been set, the processor determines a state of the execution of the first inference model based on one or more factors. In response to determining the state of the execution is at a preemptable boundary, the processor deactivates the first inference model and activates the second inference model.

    Methods and systems for masking multimedia data

    公开(公告)号:US11368699B2

    公开(公告)日:2022-06-21

    申请号:US17106954

    申请日:2020-11-30

    Abstract: Several methods and systems for masking multimedia data are disclosed. In an embodiment, a method for masking includes performing a prediction for at least one multimedia data block based on a prediction mode of a plurality of prediction modes. The at least one multimedia data block is associated with a region of interest (ROI). A residual multimedia data associated with the at least one multimedia data block is generated based on the prediction. A quantization of the residual multimedia data is performed based on a quantization parameter (QP) value. The QP value is variable such that varying the QP value controls a degree of masking of the ROI.

    DYNAMIC QUANTIZATION FOR DEEP NEURAL NETWORK INFERENCE SYSTEM AND METHOD

    公开(公告)号:US20210150248A1

    公开(公告)日:2021-05-20

    申请号:US17128365

    申请日:2020-12-21

    Abstract: A method for dynamically quantizing feature maps of a received image. The method includes convolving an image based on a predicted maximum value, a predicted minimum value, trained kernel weights and the image data. The input data is quantized based on the predicted minimum value and predicted maximum value. The output of the convolution is computed into an accumulator and re-quantized. The re-quantized value is output to an external memory. The predicted min value and the predicted max value are computed based on the previous max values and min values with a weighted average or a pre-determined formula. Initial min value and max value are computed based on known quantization methods and utilized for initializing the predicted min value and predicted max value in the quantization process.

Patent Agency Ranking