HYBRID MULTIPY-ACCUMULATION OPERATION WITH COMPRESSED WEIGHTS

    公开(公告)号:US20230229917A1

    公开(公告)日:2023-07-20

    申请号:US18184101

    申请日:2023-03-15

    CPC classification number: G06N3/08 G06F7/5443

    Abstract: A compute block can perform hybrid multiply-accumulate (MAC) operations. The compute block may include a weight compressing module and a processing element (PE) array. The weight compression module may select a first group of one or more weights and a second group of one or more weights from a weight tensor of a DNN (deep neural network) layer. A weight in the first group is quantized to a power of two value. A weight in the second group is quantized to an integer. The integer and the exponent of the power of two value may be stored in a memory in lieu of the original values of the weights. A PE in the PE array includes a shifter configured to shift an activation of the layer by the exponent of the power of two value and a multiplier configured to multiplying the integer with another activation of the layer.

    METHODS AND APPARATUS TO PERFORM LOW OVERHEAD SPARSITY ACCELERATION LOGIC FOR MULTI-PRECISION DATAFLOW IN DEEP NEURAL NETWORK ACCELERATORS

    公开(公告)号:US20220292366A1

    公开(公告)日:2022-09-15

    申请号:US17709337

    申请日:2022-03-30

    Abstract: Methods, apparatus, systems, and articles of manufacture to perform low overhead sparsity acceleration logic for multi-precision dataflow in deep neural network accelerators are disclosed. An example apparatus includes a first buffer to store data corresponding to a first precision; a second buffer to store data corresponding to a second precision; and hardware control circuitry to: process a first multibit bitmap to determine an activation precision of an activation value, the first multibit bitmap including values corresponding to different precisions; process a second multibit bitmap to determine a weight precision of a weight value, the second multibit bitmap including values corresponding to different precisions; and store the activation value and the weight value in the second buffer when at least one of the activation precision or the weight precision corresponds to the second precision.

Patent Agency Ranking