-
公开(公告)号:US12164593B2
公开(公告)日:2024-12-10
申请号:US17374988
申请日:2021-07-13
Applicant: Samsung Electronics Co., Ltd.
Inventor: Peng Gu , Krishna Malladi , Hongzhong Zheng , Dimin Niu
IPC: G06F17/16 , G06F12/0802 , G06F12/0877 , G06N3/008 , G06N3/045 , G06N3/063 , G06N3/08
Abstract: A general matrix-matrix multiplication (GEMM) dataflow accelerator circuit is disclosed that includes a smart 3D stacking DRAM architecture. The accelerator circuit includes a memory bank, a peripheral lookup table stored in the memory bank, and a first vector buffer to store a first vector that is used as a row address into the lookup table. The circuit includes a second vector buffer to store a second vector that is used as a column address into the lookup table, and lookup table buffers to receive and store lookup table entries from the lookup table. The circuit further includes adders to sum the first product and a second product, and an output buffer to store the sum. The lookup table buffers determine a product of the first vector and the second vector without performing a multiply operation. The embodiments include a hierarchical lookup architecture to reduce latency. Accumulation results are propagated in a systolic manner.
-
公开(公告)号:US12130884B2
公开(公告)日:2024-10-29
申请号:US17374988
申请日:2021-07-13
Applicant: Samsung Electronics Co., Ltd.
Inventor: Peng Gu , Krishna Malladi , Hongzhong Zheng , Dimin Niu
IPC: G06F17/16 , G06F12/0802 , G06F12/0877 , G06N3/008 , G06N3/045 , G06N3/063 , G06N3/08
CPC classification number: G06F17/16 , G06F12/0802 , G06F12/0877 , G06N3/008 , G06N3/045 , G06N3/063 , G06F2212/1024 , G06F2212/1036 , G06F2212/22 , G06N3/08
Abstract: A general matrix-matrix multiplication (GEMM) dataflow accelerator circuit is disclosed that includes a smart 3D stacking DRAM architecture. The accelerator circuit includes a memory bank, a peripheral lookup table stored in the memory bank, and a first vector buffer to store a first vector that is used as a row address into the lookup table. The circuit includes a second vector buffer to store a second vector that is used as a column address into the lookup table, and lookup table buffers to receive and store lookup table entries from the lookup table. The circuit further includes adders to sum the first product and a second product, and an output buffer to store the sum. The lookup table buffers determine a product of the first vector and the second vector without performing a multiply operation. The embodiments include a hierarchical lookup architecture to reduce latency. Accumulation results are propagated in a systolic manner.
-
公开(公告)号:US11940922B2
公开(公告)日:2024-03-26
申请号:US18081488
申请日:2022-12-14
Applicant: Samsung Electronics Co., Ltd.
Inventor: Mu-Tien Chang , Krishna T. Malladi , Dimin Niu , Hongzhong Zheng
IPC: G06F12/0875 , G06F13/12 , G06F13/16 , G06F9/30
CPC classification number: G06F12/0875 , G06F13/124 , G06F13/1636 , G06F13/1689 , G06F9/3001 , G06F9/30098 , G06F2212/452
Abstract: A method of processing in-memory commands in a high-bandwidth memory (HBM) system includes sending a function-in-HBM instruction to the HBM by a HBM memory controller of a GPU. A logic component of the HBM receives the FIM instruction and coordinates the instructions execution using the controller, an ALU, and a SRAM located on the logic component.
-
公开(公告)号:US11934669B2
公开(公告)日:2024-03-19
申请号:US16942641
申请日:2020-07-29
Applicant: Samsung Electronics Co., Ltd.
Inventor: Dimin Niu , Shuangchen Li , Bob Brennan , Krishna T. Malladi , Hongzhong Zheng
IPC: G06F3/06 , G06F15/78 , G11C11/4096
CPC classification number: G06F3/0631 , G06F3/0604 , G06F3/067 , G06F15/7821 , G11C11/4096
Abstract: A processor includes a plurality of memory units, each of the memory units including a plurality of memory cells, wherein each of the memory units is configurable to operate as memory, as a computation unit, or as a hybrid memory-computation unit.
-
公开(公告)号:US20210406202A1
公开(公告)日:2021-12-30
申请号:US17469769
申请日:2021-09-08
Applicant: Samsung Electronics Co., Ltd.
Inventor: Krishna T. Malladi , Hongzhong Zheng , Dimin Niu , Peng Gu
Abstract: A high bandwidth memory (HBM) system includes a first HBM+ card. The first HBM+ card includes a plurality of HBM+ cubes. Each HBM+ cube has a logic die and a memory die. The first HBM+ card also includes a HBM+ card controller coupled to each of the plurality of HBM+ cubes and configured to interface with a host, a pin connection configured to connect to the host, and a fabric connection configured to connect to at least one HBM+ card.
-
公开(公告)号:US11151006B2
公开(公告)日:2021-10-19
申请号:US16150239
申请日:2018-10-02
Applicant: Samsung Electronics Co., Ltd.
Inventor: Dimin Niu , Krishna Malladi , Hongzhong Zheng
Abstract: According to one general aspect, an apparatus may include a plurality of stacked integrated circuit dies that include a memory cell die and a logic die. The memory cell die may be configured to store data at a memory address. The logic die may include an interface to the stacked integrated circuit dies and configured to communicate memory accesses between the memory cell die and at least one external device. The logic die may include a reliability circuit configured to ameliorate data errors within the memory cell die. The reliability circuit may include a spare memory configured to store data, and an address table configured to map a memory address associated with an error to the spare memory. The reliability circuit may be configured to determine if the memory access is associated with an error, and if so completing the memory access with the spare memory.
-
公开(公告)号:US11029879B2
公开(公告)日:2021-06-08
申请号:US15949934
申请日:2018-04-10
Applicant: Samsung Electronics Co., Ltd.
Inventor: Dimin Niu , Mu Tien Chang , Hongzhong Zheng , Sun Young Lim , Jae-Gon Lee , Indong Kim
IPC: G06F3/06
Abstract: A method of page size aware scheduling and a non-transitory computer-readable storage medium having recorded thereon a computer program for executing the method of page size aware scheduling are provided. The method includes determining a size of a media page; determining if the media page is open or closed; performing, by a memory controller, a speculative read operation if the media page is determined to be open; and performing, by the memory controller, a regular read operation if the media page is determined to be closed.
-
公开(公告)号:US10977118B2
公开(公告)日:2021-04-13
申请号:US16276304
申请日:2019-02-14
Applicant: Samsung Electronics Co., Ltd.
Inventor: Dimin Niu , Mu-Tien Chang , Hongzhong Zheng , Hyun-Joong Kim , Won-hyung Song , Jangseok Choi
Abstract: A method of correcting a memory error of a dynamic random-access memory module (DRAM) using a double data rate (DDR) interface, the method includes conducting a memory transaction including multiple bursts with a memory controller to send data from data chips of the DRAM to the memory controller, detecting one or more errors using an ECC chip of the DRAM, determining a number of the bursts having the errors using the ECC chip of the DRAM, determining whether the number of the bursts having the errors is greater than a threshold number, determining a type of the errors, and directing the memory controller based on the determined type of the errors, wherein the DRAM includes a single ECC chip per memory channel.
-
公开(公告)号:US20210096999A1
公开(公告)日:2021-04-01
申请号:US17121488
申请日:2020-12-14
Applicant: Samsung Electronics Co., Ltd.
Inventor: Mu-Tien Chang , Krishna T. Malladi , Dimin Niu , Hongzhong Zheng
IPC: G06F12/0875 , G06F13/16 , G06F13/12
Abstract: A method of processing in-memory commands in a high-bandwidth memory (HBM) system includes sending a function-in-HBM instruction to the HBM by a HBM memory controller of a GPU. A logic component of the HBM receives the FIM instruction and coordinates the instructions execution using the controller, an ALU, and a SRAM located on the logic component.
-
公开(公告)号:US10915451B2
公开(公告)日:2021-02-09
申请号:US16439613
申请日:2019-06-12
Applicant: Samsung Electronics Co., Ltd.
Inventor: Krishna T. Malladi , Mu-Tien Chang , Dimin Niu , Hongzhong Zheng
IPC: G06F12/08 , G06F12/0879 , G11C11/417
Abstract: A high bandwidth memory system. In some embodiments, the system includes: a memory stack having a plurality of memory dies and eight 128-bit channels; and a logic die, the memory dies being stacked on, and connected to, the logic die; wherein the logic die may be configured to operate a first channel of the 128-bit channels in: a first mode, in which a first 64 bits operate in pseudo-channel mode, and a second 64 bits operate as two 32-bit fine-grain channels, or a second mode, in which the first 64 bits operate as two 32-bit fine-grain channels, and the second 64 bits operate as two 32-bit fine-grain channels.
-
-
-
-
-
-
-
-
-