METHODS, SYSTEMS, ARTICLES OF MANUFACTURE, AND APPARATUS TO DECODE ZERO-VALUE-COMPRESSION DATA VECTORS

    公开(公告)号:US20240022259A1

    公开(公告)日:2024-01-18

    申请号:US18465495

    申请日:2023-09-12

    CPC classification number: H03M7/3082 G06F16/2237 G06N3/063 G06N3/08

    Abstract: Methods, systems, articles of manufacture, and apparatus are disclosed to decode zero-value-compression data vectors. An example apparatus includes: a buffer monitor to monitor a buffer for a header including a value indicative of compressed data; a data controller to, when the buffer includes compressed data, determine a first value of a sparse select signal based on (1) a select signal and (2) a first position in a sparsity bitmap, the first value of the sparse select signal corresponding to a processing element that is to process a portion of the compressed data; and a write controller to, when the buffer includes compressed data, determine a second value of a write enable signal based on (1) the select signal and (2) a second position in the sparsity bitmap, the second value of the write enable signal corresponding to the processing element that is to process the portion of the compressed data.

    METHODS AND APPARATUS TO PERFORM LOW OVERHEAD SPARSITY ACCELERATION LOGIC FOR MULTI-PRECISION DATAFLOW IN DEEP NEURAL NETWORK ACCELERATORS

    公开(公告)号:US20220292366A1

    公开(公告)日:2022-09-15

    申请号:US17709337

    申请日:2022-03-30

    Abstract: Methods, apparatus, systems, and articles of manufacture to perform low overhead sparsity acceleration logic for multi-precision dataflow in deep neural network accelerators are disclosed. An example apparatus includes a first buffer to store data corresponding to a first precision; a second buffer to store data corresponding to a second precision; and hardware control circuitry to: process a first multibit bitmap to determine an activation precision of an activation value, the first multibit bitmap including values corresponding to different precisions; process a second multibit bitmap to determine a weight precision of a weight value, the second multibit bitmap including values corresponding to different precisions; and store the activation value and the weight value in the second buffer when at least one of the activation precision or the weight precision corresponds to the second precision.

    RUNTIME CONFIGURABLE REGISTER FILES FOR ARTIFICIAL INTELLIGENCE WORKLOADS

    公开(公告)号:US20220075659A1

    公开(公告)日:2022-03-10

    申请号:US17530156

    申请日:2021-11-18

    Abstract: There is disclosed a system and method of performing an artificial intelligence (AI) inference, including: programming an AI accelerator circuit to solve an AI problem with a plurality of layer-specific register file (RF) size allocations, wherein the AI accelerator circuit comprises processing elements (PEs) with respective associated RFs, wherein the RFs individually are divided into K sub-banks of size B bytes, wherein B and K are integers, and wherein the RFs include circuitry to individually allocate a sub-bank to one of input feature (IF), output feature (OF), or filter weight (FL), and wherein programming the plurality of layer-specific RF size allocations comprises accounting for sparse data within the layer; and causing the AI accelerator circuit to execute the AI problem, including applying the layer-specific RF size allocations at run-time.

Patent Agency Ranking