DATAFLOW ACCELERATOR ARCHITECTURE FOR GENERAL MATRIX-MATRIX MULTIPLICATION AND TENSOR COMPUTATION IN DEEP LEARNING

    公开(公告)号:US20210374210A1

    公开(公告)日:2021-12-02

    申请号:US17374988

    申请日:2021-07-13

    Abstract: A general matrix-matrix multiplication (GEMM) dataflow accelerator circuit is disclosed that includes a smart 3D stacking DRAM architecture. The accelerator circuit includes a memory bank, a peripheral lookup table stored in the memory bank, and a first vector buffer to store a first vector that is used as a row address into the lookup table. The circuit includes a second vector buffer to store a second vector that is used as a column address into the lookup table, and lookup table buffers to receive and store lookup table entries from the lookup table. The circuit further includes adders to sum the first product and a second product, and an output buffer to store the sum. The lookup table buffers determine a product of the first vector and the second vector without performing a multiply operation. The embodiments include a hierarchical lookup architecture to reduce latency. Accumulation results are propagated in a systolic manner.

    DATAFLOW ACCELERATOR ARCHITECTURE FOR GENERAL MATRIX-MATRIX MULTIPLICATION AND TENSOR COMPUTATION IN DEEP LEARNING

    公开(公告)号:US20200183837A1

    公开(公告)日:2020-06-11

    申请号:US16388863

    申请日:2019-04-18

    Abstract: A tensor computation dataflow accelerator semiconductor circuit is disclosed. The data flow accelerator includes a DRAM bank and a peripheral array of multiply-and-add units disposed adjacent to the DRAM bank. The peripheral array of multiply-and-add units are configured to form a pipelined dataflow chain in which partial output data from one multiply-and-add unit from among the array of multiply-and-add units is fed into another multiply-and-add unit from among the array of multiply-and-add units for data accumulation. Near-DRAM-processing dataflow (NDP-DF) accelerator unit dies may be stacked atop a base die. The base die may be disposed on a passive silicon interposer adjacent to a processor or a controller. The NDP-DF accelerator units may process partial matrix output data in parallel. The partial matrix output data may be propagated in a forward or backward direction. The tensor computation dataflow accelerator may perform a partial matrix transposition.

    DATAFLOW ACCELERATOR ARCHITECTURE FOR GENERAL MATRIX-MATRIX MULTIPLICATION AND TENSOR COMPUTATION IN DEEP LEARNING

    公开(公告)号:US20200184001A1

    公开(公告)日:2020-06-11

    申请号:US16388860

    申请日:2019-04-18

    Abstract: A general matrix-matrix multiplication (GEMM) dataflow accelerator circuit is disclosed that includes a smart 3D stacking DRAM architecture. The accelerator circuit includes a memory bank, a peripheral lookup table stored in the memory bank, and a first vector buffer to store a first vector that is used as a row address into the lookup table. The circuit includes a second vector buffer to store a second vector that is used as a column address into the lookup table, and lookup table buffers to receive and store lookup table entries from the lookup table. The circuit further includes adders to sum the first product and a second product, and an output buffer to store the sum. The lookup table buffers determine a product of the first vector and the second vector without performing a multiply operation. The embodiments include a hierarchical lookup architecture to reduce latency. Accumulation results are propagated in a systolic manner.

Patent Agency Ranking