Methods, apparatus, and articles of manufacture to increase data reuse for multiply and accumulate (MAC) operations

    公开(公告)号:US12169643B2

    公开(公告)日:2024-12-17

    申请号:US18465560

    申请日:2023-09-12

    Abstract: Methods, apparatus, systems, and articles of manufacture are disclosed that increase data reuse for multiply and accumulate (MAC) operations. An example apparatus includes a MAC circuit to process a first context of a set of a first type of contexts stored in a first buffer and a first context of a set of a second type of contexts stored in a second buffer. The example apparatus also includes control logic circuitry to, in response to determining that there is an additional context of the second type to be processed in the set of the second type of contexts, maintain the first context of the first type in the first buffer. The control logic circuitry is also to, in response to determining that there is an additional context of the first type to be processed in the set of the first type of contexts maintain the first context of the second type in the second buffer and iterate a pointer of the second buffer from a first position to a next position in the second buffer.

    METHODS AND APPARATUS FOR SPARSE TENSOR STORAGE FOR NEURAL NETWORK ACCELERATORS

    公开(公告)号:US20240134786A1

    公开(公告)日:2024-04-25

    申请号:US18539955

    申请日:2023-12-14

    CPC classification number: G06F12/0207 G06F12/0292 G06N3/10

    Abstract: Methods, apparatus, systems and articles of manufacture are disclosed for sparse tensor storage for neural network accelerators. An example apparatus includes sparsity map generating circuitry to generate a sparsity map corresponding to a tensor, the sparsity map to indicate whether a data point of the tensor is zero, static storage controlling circuitry to divide the tensor into one or more storage elements, and a compressor to perform a first compression of the one or more storage elements to generate one or more compressed storage elements, the first compression to remove zero points of the one or more storage elements based on the sparsity map and perform a second compression of the one or more compressed storage elements, the second compression to store the one or more compressed storage elements contiguously in memory.

    DEEP NEURAL NETWORK (DNN) ACCELERATOR FACILITATING QUANTIZED INFERENCE

    公开(公告)号:US20230059976A1

    公开(公告)日:2023-02-23

    申请号:US18047415

    申请日:2022-10-18

    Abstract: An DNN accelerator may include a PE array performing MAC operations. The PE array may include PEs capable of MAC operations on quantized values. A PE may include subtractors for subtracting zeropoints from quantized activations and quantized weights to generate intermediate activations and intermediate weights. The intermediate activations and intermediate weights may be stored in data storage units in the PE and maybe used by an MAC unit in the PE. The subtractors may be placed outside the MAC unit but inside the PE. The MAC unit may perform sequential cycles of MAC operations. The MAC unit may include a plurality of multipliers. The intermediate activations and intermediate weights stored in the data storage units may be reused by different multipliers in different cycles of MAC operations. An output of the MAC unit or of the PE may be multiplied with a quantization scale to produce a floating-point value.

    METHODS AND APPARATUS FOR SPARSE TENSOR STORAGE FOR NEURAL NETWORK ACCELERATORS

    公开(公告)号:US20210406164A1

    公开(公告)日:2021-12-30

    申请号:US17359217

    申请日:2021-06-25

    Abstract: Methods, apparatus, systems and articles of manufacture are disclosed for sparse tensor storage for neural network accelerators. An example apparatus includes sparsity map generating circuitry to generate a sparsity map corresponding to a tensor, the sparsity map to indicate whether a data point of the tensor is zero, static storage controlling circuitry to divide the tensor into one or more storage elements, and a compressor to perform a first compression of the one or more storage elements to generate one or more compressed storage elements, the first compression to remove zero points of the one or more storage elements based on the sparsity map and perform a second compression of the one or more compressed storage elements, the second compression to store the one or more compressed storage elements contiguously in memory.

Patent Agency Ranking