-
公开(公告)号:US10732929B2
公开(公告)日:2020-08-04
申请号:US15916196
申请日:2018-03-08
Applicant: Samsung Electronics Co., Ltd.
Inventor: Krishna T. Malladi , Peng Gu , Hongzhong Zheng , Robert Brennan
Abstract: A computing accelerator using a lookup table. The accelerator may accelerate floating point multiplications by retrieving the fraction portion of the product of two floating-point operands from a lookup table, or by retrieving the product of two floating-point operands of two floating-point operands from a lookup table, or it may retrieve dot products of floating point vectors from a lookup table. The accelerator may be implemented in a three-dimensional memory assembly. It may use approximation, the symmetry of a multiplication lookup table, and zero-skipping to improve performance.
-
公开(公告)号:US20190187898A1
公开(公告)日:2019-06-20
申请号:US15916228
申请日:2018-03-08
Applicant: Samsung Electronics Co., Ltd.
Inventor: Peng Gu , Krishna T. Malladi , Hongzhong Zheng
CPC classification number: G06F3/064 , G06F3/0604 , G06F3/0673 , G06N3/08
Abstract: A storage device and method of controlling a storage device are disclosed. The storage device includes a host, a logic die, and a high bandwidth memory stack including a memory die. A computation lookup table is stored on a memory array of the memory die. The host sends a command to perform an operation utilizing a kernel and a plurality of input feature maps, includes finding the product of a weight of the kernel and values of multiple input feature maps. The computation lookup table includes a row corresponding to a weight of the kernel, and a column corresponding to a value of the input feature maps. A result value stored at a position corresponding to a row and a column is the product of the weight corresponding to the row and the value corresponding to the column.
-
公开(公告)号:US20240028332A1
公开(公告)日:2024-01-25
申请号:US18375874
申请日:2023-10-02
Applicant: Samsung Electronics Co., Ltd.
Inventor: Peng Gu , Krishna T. Malladi , Hongzhong Zheng
CPC classification number: G06F9/3001 , G06F12/0207 , G06F17/16 , G06F7/00 , G06F7/4876 , G06F9/3004 , G06F2212/1024
Abstract: According to some example embodiments of the present disclosure, in a method for a memory lookup mechanism in a high-bandwidth memory system, the method includes: using a memory die to conduct a multiplication operation using a lookup table (LUT) methodology by accessing a LUT, which includes floating point operation results, stored on the memory die; sending, by the memory die, a result of the multiplication operation to a logic die including a processor and a buffer; and conducting, by the logic die, a matrix multiplication operation using computation units.
-
公开(公告)号:US20230289081A1
公开(公告)日:2023-09-14
申请号:US18315821
申请日:2023-05-11
Applicant: Samsung Electronics Co., Ltd.
Inventor: Peng Gu , Krishna T. Malladi , Hongzhong Zheng
CPC classification number: G06F3/064 , G06N3/08 , G06F3/0673 , G06F3/0604
Abstract: A storage device and method of controlling a storage device are disclosed. The storage device includes a host, a logic die, and a high bandwidth memory stack including a memory die. A computation lookup table is stored on a memory array of the memory die. The host sends a command to perform an operation utilizing a kernel and a plurality of input feature maps, includes finding the product of a weight of the kernel and values of multiple input feature maps. The computation lookup table includes a row corresponding to a weight of the kernel, and a column corresponding to a value of the input feature maps. A result value stored at a position corresponding to a row and a column is the product of the weight corresponding to the row and the value corresponding to the column.
-
公开(公告)号:US20210405877A1
公开(公告)日:2021-12-30
申请号:US17473532
申请日:2021-09-13
Applicant: Samsung Electronics Co., Ltd.
Inventor: Peng Gu , Krishna T. Malladi , Hongzhong Zheng
Abstract: A storage device and method of controlling a storage device are disclosed. The storage device includes a host, a logic die, and a high bandwidth memory stack including a memory die. A computation lookup table is stored on a memory array of the memory die. The host sends a command to perform an operation utilizing a kernel and a plurality of input feature maps, includes finding the product of a weight of the kernel and values of multiple input feature maps. The computation lookup table includes a row corresponding to a weight of the kernel, and a column corresponding to a value of the input feature maps. A result value stored at a position corresponding to a row and a column is the product of the weight corresponding to the row and the value corresponding to the column.
-
公开(公告)号:US11100193B2
公开(公告)日:2021-08-24
申请号:US16388860
申请日:2019-04-18
Applicant: Samsung Electronics Co., Ltd.
Inventor: Peng Gu , Krishna Malladi , Hongzhong Zheng , Dimin Niu
IPC: G06F17/16 , G06F12/0877 , G06F12/0802 , G06N3/063 , G06N3/00 , G06N3/04 , G06N3/08
Abstract: A general matrix-matrix multiplication (GEMM) dataflow accelerator circuit is disclosed that includes a smart 3D stacking DRAM architecture. The accelerator circuit includes a memory bank, a peripheral lookup table stored in the memory bank, and a first vector buffer to store a first vector that is used as a row address into the lookup table. The circuit includes a second vector buffer to store a second vector that is used as a column address into the lookup table, and lookup table buffers to receive and store lookup table entries from the lookup table. The circuit further includes adders to sum the first product and a second product, and an output buffer to store the sum. The lookup table buffers determine a product of the first vector and the second vector without performing a multiply operation. The embodiments include a hierarchical lookup architecture to reduce latency. Accumulation results are propagated in a systolic manner.
-
公开(公告)号:US12164593B2
公开(公告)日:2024-12-10
申请号:US17374988
申请日:2021-07-13
Applicant: Samsung Electronics Co., Ltd.
Inventor: Peng Gu , Krishna Malladi , Hongzhong Zheng , Dimin Niu
IPC: G06F17/16 , G06F12/0802 , G06F12/0877 , G06N3/008 , G06N3/045 , G06N3/063 , G06N3/08
Abstract: A general matrix-matrix multiplication (GEMM) dataflow accelerator circuit is disclosed that includes a smart 3D stacking DRAM architecture. The accelerator circuit includes a memory bank, a peripheral lookup table stored in the memory bank, and a first vector buffer to store a first vector that is used as a row address into the lookup table. The circuit includes a second vector buffer to store a second vector that is used as a column address into the lookup table, and lookup table buffers to receive and store lookup table entries from the lookup table. The circuit further includes adders to sum the first product and a second product, and an output buffer to store the sum. The lookup table buffers determine a product of the first vector and the second vector without performing a multiply operation. The embodiments include a hierarchical lookup architecture to reduce latency. Accumulation results are propagated in a systolic manner.
-
公开(公告)号:US12130884B2
公开(公告)日:2024-10-29
申请号:US17374988
申请日:2021-07-13
Applicant: Samsung Electronics Co., Ltd.
Inventor: Peng Gu , Krishna Malladi , Hongzhong Zheng , Dimin Niu
IPC: G06F17/16 , G06F12/0802 , G06F12/0877 , G06N3/008 , G06N3/045 , G06N3/063 , G06N3/08
CPC classification number: G06F17/16 , G06F12/0802 , G06F12/0877 , G06N3/008 , G06N3/045 , G06N3/063 , G06F2212/1024 , G06F2212/1036 , G06F2212/22 , G06N3/08
Abstract: A general matrix-matrix multiplication (GEMM) dataflow accelerator circuit is disclosed that includes a smart 3D stacking DRAM architecture. The accelerator circuit includes a memory bank, a peripheral lookup table stored in the memory bank, and a first vector buffer to store a first vector that is used as a row address into the lookup table. The circuit includes a second vector buffer to store a second vector that is used as a column address into the lookup table, and lookup table buffers to receive and store lookup table entries from the lookup table. The circuit further includes adders to sum the first product and a second product, and an output buffer to store the sum. The lookup table buffers determine a product of the first vector and the second vector without performing a multiply operation. The embodiments include a hierarchical lookup architecture to reduce latency. Accumulation results are propagated in a systolic manner.
-
公开(公告)号:US20210406202A1
公开(公告)日:2021-12-30
申请号:US17469769
申请日:2021-09-08
Applicant: Samsung Electronics Co., Ltd.
Inventor: Krishna T. Malladi , Hongzhong Zheng , Dimin Niu , Peng Gu
Abstract: A high bandwidth memory (HBM) system includes a first HBM+ card. The first HBM+ card includes a plurality of HBM+ cubes. Each HBM+ cube has a logic die and a memory die. The first HBM+ card also includes a HBM+ card controller coupled to each of the plurality of HBM+ cubes and configured to interface with a host, a pin connection configured to connect to the host, and a fabric connection configured to connect to at least one HBM+ card.
-
公开(公告)号:US20190212980A1
公开(公告)日:2019-07-11
申请号:US15916196
申请日:2018-03-08
Applicant: Samsung Electronics Co., Ltd.
Inventor: Krishna T. Malladi , Peng Gu , Hongzhong Zheng , Robert Brennan
Abstract: A computing accelerator using a lookup table. The accelerator may accelerate floating point multiplications by retrieving the fraction portion of the product of two floating-point operands from a lookup table, or by retrieving the product of two floating-point operands of two floating-point operands from a lookup table, or it may retrieve dot products of floating point vectors from a lookup table. The accelerator may be implemented in a three-dimensional memory assembly. It may use approximation, the symmetry of a multiplication lookup table, and zero-skipping to improve performance.
-
-
-
-
-
-
-
-
-