Heterogeneous accelerator for highly efficient learning systems

    公开(公告)号:US11226914B2

    公开(公告)日:2022-01-18

    申请号:US16595452

    申请日:2019-10-07

    Abstract: An apparatus may include a heterogeneous computing environment that may be controlled, at least in part, by a task scheduler in which the heterogeneous computing environment may include a processing unit having fixed logical circuits configured to execute instructions; a reprogrammable processing unit having reprogrammable logical circuits configured to execute instructions that include instructions to control processing-in-memory functionality; and a stack of high-bandwidth memory dies in which each may be configured to store data and to provide processing-in-memory functionality controllable by the reprogrammable processing unit such that the reprogrammable processing unit is at least partially stacked with the high-bandwidth memory dies. The task scheduler may be configured to schedule computational tasks between the processing unit, and the reprogrammable processing unit.

    HBM BASED MEMORY LOOKUP ENGINE FOR DEEP LEARNING ACCELERATOR

    公开(公告)号:US20210405877A1

    公开(公告)日:2021-12-30

    申请号:US17473532

    申请日:2021-09-13

    Abstract: A storage device and method of controlling a storage device are disclosed. The storage device includes a host, a logic die, and a high bandwidth memory stack including a memory die. A computation lookup table is stored on a memory array of the memory die. The host sends a command to perform an operation utilizing a kernel and a plurality of input feature maps, includes finding the product of a weight of the kernel and values of multiple input feature maps. The computation lookup table includes a row corresponding to a weight of the kernel, and a column corresponding to a value of the input feature maps. A result value stored at a position corresponding to a row and a column is the product of the weight corresponding to the row and the value corresponding to the column.

    PSEUDO MAIN MEMORY SYSTEM
    43.
    发明申请

    公开(公告)号:US20200042435A1

    公开(公告)日:2020-02-06

    申请号:US16600313

    申请日:2019-10-11

    Abstract: A pseudo main memory system. The system includes a memory adapter circuit for performing memory augmentation using compression, deduplication, and/or error correction. The memory adapter circuit is connected to a memory, and employs the memory augmentation methods to increase the effective storage capacity of the memory. The memory adapter circuit is also connected to a memory bus and implements an NVDIMM-F or modified NVDIMM-F interface for connecting to the memory bus.

    Memory apparatus for in-place regular expression search

    公开(公告)号:US10282436B2

    公开(公告)日:2019-05-07

    申请号:US15470709

    申请日:2017-03-27

    Abstract: A method of searching for data stored in a memory, the method including receiving a regex search request, generating a parse tree including fundamental regex operations corresponding to the regex search request, individually analyzing each of the fundamental regex operations of the generated parse tree in a respective time-step, determining a memory address location of data corresponding to the analyzed fundamental regex operations by using a translation table to determine whether the data exists, and using a reverse translation table to determine the memory address location of the data, and outputting data matching the regex search request after analyzing all of the fundamental regex operations of the generated parse tree.

    QUASI-SYNCHRONOUS PROTOCOL FOR LARGE BANDWIDTH MEMORY SYSTEMS

    公开(公告)号:US20190079678A1

    公开(公告)日:2019-03-14

    申请号:US15821688

    申请日:2017-11-22

    Abstract: A high-bandwidth memory (HBM) system includes an HBM device and a logic circuit. The logic circuit includes a first interface coupled to a host device and a second interface coupled to the HBM device. The logic circuit receives a first command from the host device through the first interface and converts the received first command to a first processing-in-memory (PIM) command that is sent to the HBM device through the second interface. The first PIM command has a deterministic latency for completion. The logic circuit further receives a second command from the host device through the first interface and converting the received second command to a second PIM command that is sent to the HBM device through the second interface. The second PIM command has a non-deterministic latency for completion.

    Unified object interface for memory and storage system

    公开(公告)号:US10169124B2

    公开(公告)日:2019-01-01

    申请号:US14733895

    申请日:2015-06-08

    Abstract: A data structure and a mechanism to manage storage of objects is disclosed. The data structure can be used to manage storage of objects on any storage device, whether in memory or some other storage device. Given an object ID (OID) for an object, the system can identify a tuple that includes a device ID and an address. The device ID specifies the device storing the object, and the address specifies the address on the device where the object is stored. The application can then access the object using the device ID and the address.

    Reconfigurable logic architecture
    48.
    发明授权
    Reconfigurable logic architecture 有权
    可重构逻辑架构

    公开(公告)号:US09577644B2

    公开(公告)日:2017-02-21

    申请号:US14838347

    申请日:2015-08-27

    Abstract: According to one general aspect, an apparatus may include a plurality of stacked integrated circuit dies. The dies may include a memory cell die configured to store data in a random access fashion. The dies may also include a look-up table die comprising a random access memory array that, in turn, includes a reconfigurable look-up table. The reconfigurable look-up table may be configured to perform a logic function. The reconfigurable look-up table may include a plurality of random access memory cells configured to store a look-up table to perform a logic function, and a local row decoder configured to activate one or more rows of memory cells based upon a set of input signals. The look-up table stored in the plurality of memory cells may be configured to be dynamically altered via a memory write operation to the random access memory array.

    Abstract translation: 根据一个一般方面,一种装置可以包括多个堆叠的集成电路管芯。 管芯可以包括被配置为以随机存取方式存储数据的存储单元管芯。 模具还可以包括查找表模具,其包括随机存取存储器阵列,其又包括可重新配置的查找表。 可重构查找表可以被配置为执行逻辑功能。 可重构查找表可以包括被配置为存储查询表以执行逻辑功能的多个随机存取存储器单元,以及配置成基于一组输入来激活一行或多行存储器单元的本地行解码器 信号。 存储在多个存储器单元中的查找表可以被配置为通过存储器写入操作来动态地改变到随机存取存储器阵列。

    HBM BASED MEMORY LOOKUP ENGINE FOR DEEP LEARNING ACCELERATOR

    公开(公告)号:US20250004658A1

    公开(公告)日:2025-01-02

    申请号:US18763864

    申请日:2024-07-03

    Abstract: A storage device and method of controlling a storage device are disclosed. The storage device includes a host, a logic die, and a high bandwidth memory stack including a memory die. A computation lookup table is stored on a memory array of the memory die. The host sends a command to perform an operation utilizing a kernel and a plurality of input feature maps, includes finding the product of a weight of the kernel and values of multiple input feature maps. The computation lookup table includes a row corresponding to a weight of the kernel, and a column corresponding to a value of the input feature maps. A result value stored at a position corresponding to a row and a column is the product of the weight corresponding to the row and the value corresponding to the column.

    Neural network training with acceleration

    公开(公告)号:US12136138B2

    公开(公告)日:2024-11-05

    申请号:US17670044

    申请日:2022-02-11

    Abstract: A system and method for training a neural network. In some embodiments, the system includes: a graphics processing unit cluster; and a computational storage cluster connected to the graphics processing unit cluster by a cache-coherent system interconnect. The graphics processing unit cluster may include one or more graphics processing units. The computational storage cluster may include one or more computational storage devices. A first computational storage device of the one or more computational storage devices may be configured to (i) store an embedding table, (ii) receive an index vector including a first index and a second index; and (iii) calculate an embedded vector based on: a first row of the embedding table, corresponding to the first index, and a second row of the embedding table, corresponding to the second index.

Patent Agency Ranking