HALO TRANSFER FOR CONVOLUTION WORKLOAD PARTITION

    公开(公告)号:US20230116629A1

    公开(公告)日:2023-04-13

    申请号:US18046256

    申请日:2022-10-13

    Abstract: A DNN accelerator includes multiple compute tiles for sharing a workload of running a convolution. A halo pipeline in a compute tile can facilitate replications of halo data from the compute tile where the halo data is generated into another compute tile. The halo pipeline may receive a memory transaction for writing a data block. The halo pipeline may determine that the data block falls into a halo region in an input tensor of the convolution. The halo pipeline may generate a remote address for storing the data block in a memory of the other compute tile, e.g., based on a local address of the data block in a memory of the compute tile. The halo pipeline may adjust the remote address, e.g., based on a difference in dimensions of a tensor to be used by the compute tile and a tensor to be used by the other compute tile.

    DEEP NEURAL NETWORK (DNN) ACCELERATOR FACILITATING QUANTIZED INFERENCE

    公开(公告)号:US20230059976A1

    公开(公告)日:2023-02-23

    申请号:US18047415

    申请日:2022-10-18

    Abstract: An DNN accelerator may include a PE array performing MAC operations. The PE array may include PEs capable of MAC operations on quantized values. A PE may include subtractors for subtracting zeropoints from quantized activations and quantized weights to generate intermediate activations and intermediate weights. The intermediate activations and intermediate weights may be stored in data storage units in the PE and maybe used by an MAC unit in the PE. The subtractors may be placed outside the MAC unit but inside the PE. The MAC unit may perform sequential cycles of MAC operations. The MAC unit may include a plurality of multipliers. The intermediate activations and intermediate weights stored in the data storage units may be reused by different multipliers in different cycles of MAC operations. An output of the MAC unit or of the PE may be multiplied with a quantization scale to produce a floating-point value.

Patent Agency Ranking