DATAFLOW ACCELERATOR ARCHITECTURE FOR GENERAL MATRIX-MATRIX MULTIPLICATION AND TENSOR COMPUTATION IN DEEP LEARNING

    公开(公告)号:US20210374210A1

    公开(公告)日:2021-12-02

    申请号:US17374988

    申请日:2021-07-13

    Abstract: A general matrix-matrix multiplication (GEMM) dataflow accelerator circuit is disclosed that includes a smart 3D stacking DRAM architecture. The accelerator circuit includes a memory bank, a peripheral lookup table stored in the memory bank, and a first vector buffer to store a first vector that is used as a row address into the lookup table. The circuit includes a second vector buffer to store a second vector that is used as a column address into the lookup table, and lookup table buffers to receive and store lookup table entries from the lookup table. The circuit further includes adders to sum the first product and a second product, and an output buffer to store the sum. The lookup table buffers determine a product of the first vector and the second vector without performing a multiply operation. The embodiments include a hierarchical lookup architecture to reduce latency. Accumulation results are propagated in a systolic manner.

    BANDWIDTH BOOSTED STACKED MEMORY
    26.
    发明申请

    公开(公告)号:US20210141735A1

    公开(公告)日:2021-05-13

    申请号:US17156362

    申请日:2021-01-22

    Abstract: A high bandwidth memory system. In some embodiments, the system includes: a memory stack having a plurality of memory dies and eight 128-bit channels; and a logic die, the memory dies being stacked on, and connected to, the logic die; wherein the logic die may be configured to operate a first channel of the 128-bit channels in: a first mode, in which a first 64 bits operate in pseudo-channel mode, and a second 64 bits operate as two 32-bit fine-grain channels, or a second mode, in which the first 64 bits operate as two 32-bit fine-grain channels, and the second 64 bits operate as two 32-bit fine-grain channels.

    HOST-BASED AND CLIENT-BASED COMMAND SCHEDULING IN LARGE BANDWIDTH MEMORY SYSTEMS

    公开(公告)号:US20210117103A1

    公开(公告)日:2021-04-22

    申请号:US17133987

    申请日:2020-12-24

    Abstract: A high-bandwidth memory (HBM) system includes an HBM device and a logic circuit. The logic circuit receives a first command from the host device and converts the received first command to a processing-in-memory (PIM) command that is sent to the HBM device through the second interface. A time between when the first command is received from the host device and when the HBM system is ready to receive another command from the host device is deterministic. The logic circuit further receives a fourth command and a fifth command from the host device. The fifth command requests time-estimate information relating to a time between when the fifth command is received and when the HBM system is ready to receive another command from the host device. The time-estimate information includes a deterministic period of time and an estimated period of time for a non-deterministic period of time.

    COORDINATED IN-MODULE RAS FEATURES FOR SYNCHRONOUS DDR COMPATIBLE MEMORY

    公开(公告)号:US20200218447A1

    公开(公告)日:2020-07-09

    申请号:US16819032

    申请日:2020-03-13

    Abstract: A memory module includes a memory array, an interface and a controller. The memory array includes an array of memory cells and is configured as a dual in-line memory module (DIMM). The DIMM includes a plurality of connections that have been repurposed from a standard DIMM pin out configuration to interface operational status of the memory device to a host device. The interface is coupled to the memory array and the plurality of connections of the DIMM to interface the memory array to the host device. The controller is coupled to the memory array and the interface and controls at least one of a refresh operation of the memory array, control an error-correction operation of the memory array, control a memory scrubbing operation of the memory array, and control a wear-level control operation of the array, and the controller to interface with the host device.

    HETEROGENEOUS ACCELERATOR FOR HIGHLY EFFICIENT LEARNING SYSTEMS

    公开(公告)号:US20190079886A1

    公开(公告)日:2019-03-14

    申请号:US15825047

    申请日:2017-11-28

    CPC classification number: G06F13/28 G06F9/445 G06F9/4806 G06F2015/768

    Abstract: An apparatus may include a heterogeneous computing environment that may be controlled, at least in part, by a task scheduler in which the heterogeneous computing environment may include a processing unit having fixed logical circuits configured to execute instructions; a reprogrammable processing unit having reprogrammable logical circuits configured to execute instructions that include instructions to control processing-in-memory functionality; and a stack of high-bandwidth memory dies in which each may be configured to store data and to provide processing-in-memory functionality controllable by the reprogrammable processing unit such that the reprogrammable processing unit is at least partially stacked with the high-bandwidth memory dies. The task scheduler may be configured to schedule computational tasks between the processing unit, and the reprogrammable processing unit.

Patent Agency Ranking