DECREASED QUANTIZATION LATENCY
    1.
    发明公开

    公开(公告)号:US20230410255A1

    公开(公告)日:2023-12-21

    申请号:US18251220

    申请日:2021-01-22

    CPC classification number: G06T3/4046 G06F9/5027

    Abstract: Systems and techniques are described herein for decreasing quantization latency. In some aspects, a process includes determining a first integer data type of data at least one layer of a neural network is configured to process, and determining a second integer data type of data received for processing by the neural network. The second integer data type can be different than the first integer data type. The process further includes determining a ratio between a first size of the first integer data type and a second size of the second integer data type, and scaling parameters of the at least one layer of the neural network using a scaling factor corresponding to the ratio. The process further includes quantize the scaled parameters of the neural network, and inputting the received data to the neural network with the quantized and scaled parameters.

Patent Agency Ranking