-
公开(公告)号:US20210374210A1
公开(公告)日:2021-12-02
申请号:US17374988
申请日:2021-07-13
Applicant: Samsung Electronics Co., Ltd.
Inventor: Peng GU , Krishna MALLADI , Hongzhong ZHENG , Dimin NIU
IPC: G06F17/16 , G06F12/0877 , G06F12/0802 , G06N3/063 , G06N3/00 , G06N3/04
Abstract: A general matrix-matrix multiplication (GEMM) dataflow accelerator circuit is disclosed that includes a smart 3D stacking DRAM architecture. The accelerator circuit includes a memory bank, a peripheral lookup table stored in the memory bank, and a first vector buffer to store a first vector that is used as a row address into the lookup table. The circuit includes a second vector buffer to store a second vector that is used as a column address into the lookup table, and lookup table buffers to receive and store lookup table entries from the lookup table. The circuit further includes adders to sum the first product and a second product, and an output buffer to store the sum. The lookup table buffers determine a product of the first vector and the second vector without performing a multiply operation. The embodiments include a hierarchical lookup architecture to reduce latency. Accumulation results are propagated in a systolic manner.
-
公开(公告)号:US20160307619A1
公开(公告)日:2016-10-20
申请号:US14850938
申请日:2015-09-10
Applicant: Samsung Electronics Co., Ltd.
Inventor: Mu-Tien CHANG , Krishna MALLADI , Dimin NIU , Hongzhong ZHENG
IPC: G11C11/406 , G11C11/4076
CPC classification number: G11C11/40615 , G11C5/04 , G11C5/14 , G11C11/40618 , G11C11/4076
Abstract: A dynamic Random Access Memory (DRAM) module (105) is disclosed. The DRAM module (105) can includes a plurality of banks (205-1, 205-2, 205-3, 205-4) to store data and a refresh engine (115) that can be used to refresh one of the plurality of banks (205-1, 205-2, 205-3, 205-4). The DRAM module (105) can also include a Smart Refresh Component (305) that can advise the refresh engine (115) which bank to refresh using an out-of-order per-bank refresh. The Smart Refresh Component (305) can use a logic (415) to identify a farthest bank in the pending transactions in the transaction queue (430) at the time of refresh.
Abstract translation: 公开了一种动态随机存取存储器(DRAM)模块(105)。 DRAM模块(105)可以包括用于存储数据的多个存储体(205-1,205-2,205-3,205-4)和可用于刷新多个存储数据中的一个的刷新引擎(115) 银行(205-1,205-2,205-3,205-4)。 DRAM模块(105)还可以包括智能刷新组件(305),该智能刷新组件可以通过使用每次刷新无序刷新哪个存储体来刷新刷新引擎(115)。 在刷新时,智能刷新组件(305)可以使用逻辑(415)来识别事务队列(430)中的待处理事务中的最远存储体。
-
公开(公告)号:US20250077370A1
公开(公告)日:2025-03-06
申请号:US18953042
申请日:2024-11-19
Applicant: Samsung Electronics Co., Ltd.
Inventor: Dimin NIU , Krishna MALLADI , Hongzhong ZHENG
Abstract: According to one general aspect, an apparatus may include a plurality of stacked integrated circuit dies that include a memory cell die and a logic die. The memory cell die may be configured to store data at a memory address. The logic die may include an interface to the stacked integrated circuit dies and configured to communicate memory accesses between the memory cell die and at least one external device. The logic die may include a reliability circuit configured to ameliorate data errors within the memory cell die. The reliability circuit may include a spare memory configured to store data, and an address table configured to map a memory address associated with an error to the spare memory. The reliability circuit may be configured to determine if the memory access is associated with an error, and if so completing the memory access with the spare memory.
-
公开(公告)号:US20190214365A1
公开(公告)日:2019-07-11
申请号:US15911063
申请日:2018-03-02
Applicant: Samsung Electronics Co., Ltd.
Inventor: Peng GU , Krishna MALLADI , Hongzhong ZHENG
IPC: H01L25/065 , H01L31/12 , H01L31/02 , H01L31/0232 , H01L25/18 , G02F1/01 , H04B10/80 , H04Q11/00
Abstract: According to one general aspect, an apparatus may include a memory circuit die configured to store a lookup table that converts first data to second data. The apparatus may also include a logic circuit die comprising combinatorial logic circuits configured to receive the second data. The apparatus may further include an optical via coupled between the memory circuit die and the logical circuit die and configured to transfer second data between the memory circuit die and the logic circuit die.
-
公开(公告)号:US20170351453A1
公开(公告)日:2017-12-07
申请号:US15230322
申请日:2016-08-05
Applicant: SAMSUNG ELECTRONICS CO., LTD.
Inventor: Krishna MALLADI , Hongzhong ZHENG
IPC: G06F3/06
CPC classification number: G06F3/0631 , G06F3/0604 , G06F3/0608 , G06F3/0644 , G06F3/0661 , G06F3/0665 , G06F3/0683 , G06F11/00 , G06F12/023
Abstract: A memory module includes one or more memory devices, a memory interface to a host computer, and a memory overprovisioning logic. The memory overprovisioning logic is configured to monitor memory usage of the one or more memory devices and provide a compression and/or deduplication ratio of the memory module to a kernel driver module of the host computer. The kernel driver module of the host computer is configured to update a virtual memory capacity of the memory module based on the compression and/or deduplication ratio.
-
6.
公开(公告)号:US20200183837A1
公开(公告)日:2020-06-11
申请号:US16388863
申请日:2019-04-18
Applicant: Samsung Electronics Co., Ltd.
Inventor: Peng GU , Krishna MALLADI , Hongzhong ZHENG , Dimin NIU
IPC: G06F12/0802 , G06F17/16
Abstract: A tensor computation dataflow accelerator semiconductor circuit is disclosed. The data flow accelerator includes a DRAM bank and a peripheral array of multiply-and-add units disposed adjacent to the DRAM bank. The peripheral array of multiply-and-add units are configured to form a pipelined dataflow chain in which partial output data from one multiply-and-add unit from among the array of multiply-and-add units is fed into another multiply-and-add unit from among the array of multiply-and-add units for data accumulation. Near-DRAM-processing dataflow (NDP-DF) accelerator unit dies may be stacked atop a base die. The base die may be disposed on a passive silicon interposer adjacent to a processor or a controller. The NDP-DF accelerator units may process partial matrix output data in parallel. The partial matrix output data may be propagated in a forward or backward direction. The tensor computation dataflow accelerator may perform a partial matrix transposition.
-
公开(公告)号:US20180121130A1
公开(公告)日:2018-05-03
申请号:US15426015
申请日:2017-02-06
Applicant: Samsung Electronics Co., Ltd.
Inventor: Shaungchen LI , Dimin NIU , Krishna MALLADI , Hongzhong ZHENG
IPC: G06F3/06 , G11C11/4094
CPC classification number: G06F3/0647 , G06F3/061 , G06F3/0683 , G11C7/1006 , G11C11/405 , G11C11/4091 , G11C11/4094 , G11C11/4097
Abstract: A system includes a library, a compiler, a driver and at least one dynamic random access memory (DRAM) processing unit (DPU). The library may determine at least one DPU operation corresponding to a received command. The compiler may form at least one DPU instruction for the DPU operation. The driver may send the at least one DPU instruction to at least one DPU. The DPU may include at least one computing cell array that includes a plurality of DRAM-based computing cells arranged in an array having at least one column in which the at least one column may include at least three rows of DRAM-based computing cells configured to provide a logic function that operates on a first row and a second row of the at least three rows and configured to store a result of the logic function in a third row of the at least three rows.
-
8.
公开(公告)号:US20180039443A1
公开(公告)日:2018-02-08
申请号:US15285437
申请日:2016-10-04
Applicant: SAMSUNG ELECTRONICS CO., LTD
Inventor: Hongzhong ZHENG , Krishna MALLADI , Dimin NIU
IPC: G06F3/06 , G06F13/42 , G06F12/1009
CPC classification number: G06F3/0641 , G06F3/0608 , G06F3/0619 , G06F3/065 , G06F3/0683 , G06F12/1009 , G06F13/4282 , G06F2212/1044
Abstract: A memory module has a logic including a programming register, a deduplication ratio control logic, and a deduplication engine. The programming register stores a maximum deduplication ratio of the memory module. The control logic is configured to control a deduplication ratio of the memory module according to the maximum deduplication ratio. The deduplication ratio is programmable by the host computer.
-
公开(公告)号:US20220367412A1
公开(公告)日:2022-11-17
申请号:US17873120
申请日:2022-07-25
Applicant: Samsung Electronics Co., Ltd.
Inventor: Peng GU , Krishna MALLADI , Hongzhong ZHENG
IPC: H01L25/065 , H01L31/12 , H01L31/02 , H01L31/0232 , H01L25/18 , H04B10/80 , H04Q11/00 , G02F1/01
Abstract: According to one general aspect, an apparatus may include a memory circuit die configured to store a lookup table that converts first data to second data. The apparatus may also include a logic circuit die comprising combinatorial logic circuits configured to receive the second data. The apparatus may further include an optical via coupled between the memory circuit die and the logical circuit die and configured to transfer second data between the memory circuit die and the logic circuit die.
-
10.
公开(公告)号:US20200184001A1
公开(公告)日:2020-06-11
申请号:US16388860
申请日:2019-04-18
Applicant: Samsung Electronics Co., Ltd.
Inventor: Peng GU , Krishna MALLADI , Hongzhong ZHENG , Dimin NIU
IPC: G06F17/16 , G06F12/0877
Abstract: A general matrix-matrix multiplication (GEMM) dataflow accelerator circuit is disclosed that includes a smart 3D stacking DRAM architecture. The accelerator circuit includes a memory bank, a peripheral lookup table stored in the memory bank, and a first vector buffer to store a first vector that is used as a row address into the lookup table. The circuit includes a second vector buffer to store a second vector that is used as a column address into the lookup table, and lookup table buffers to receive and store lookup table entries from the lookup table. The circuit further includes adders to sum the first product and a second product, and an output buffer to store the sum. The lookup table buffers determine a product of the first vector and the second vector without performing a multiply operation. The embodiments include a hierarchical lookup architecture to reduce latency. Accumulation results are propagated in a systolic manner.
-
-
-
-
-
-
-
-
-