-
公开(公告)号:US20210374210A1
公开(公告)日:2021-12-02
申请号:US17374988
申请日:2021-07-13
Applicant: Samsung Electronics Co., Ltd.
Inventor: Peng GU , Krishna MALLADI , Hongzhong ZHENG , Dimin NIU
IPC: G06F17/16 , G06F12/0877 , G06F12/0802 , G06N3/063 , G06N3/00 , G06N3/04
Abstract: A general matrix-matrix multiplication (GEMM) dataflow accelerator circuit is disclosed that includes a smart 3D stacking DRAM architecture. The accelerator circuit includes a memory bank, a peripheral lookup table stored in the memory bank, and a first vector buffer to store a first vector that is used as a row address into the lookup table. The circuit includes a second vector buffer to store a second vector that is used as a column address into the lookup table, and lookup table buffers to receive and store lookup table entries from the lookup table. The circuit further includes adders to sum the first product and a second product, and an output buffer to store the sum. The lookup table buffers determine a product of the first vector and the second vector without performing a multiply operation. The embodiments include a hierarchical lookup architecture to reduce latency. Accumulation results are propagated in a systolic manner.
-
公开(公告)号:US20190214365A1
公开(公告)日:2019-07-11
申请号:US15911063
申请日:2018-03-02
Applicant: Samsung Electronics Co., Ltd.
Inventor: Peng GU , Krishna MALLADI , Hongzhong ZHENG
IPC: H01L25/065 , H01L31/12 , H01L31/02 , H01L31/0232 , H01L25/18 , G02F1/01 , H04B10/80 , H04Q11/00
Abstract: According to one general aspect, an apparatus may include a memory circuit die configured to store a lookup table that converts first data to second data. The apparatus may also include a logic circuit die comprising combinatorial logic circuits configured to receive the second data. The apparatus may further include an optical via coupled between the memory circuit die and the logical circuit die and configured to transfer second data between the memory circuit die and the logic circuit die.
-
3.
公开(公告)号:US20200183837A1
公开(公告)日:2020-06-11
申请号:US16388863
申请日:2019-04-18
Applicant: Samsung Electronics Co., Ltd.
Inventor: Peng GU , Krishna MALLADI , Hongzhong ZHENG , Dimin NIU
IPC: G06F12/0802 , G06F17/16
Abstract: A tensor computation dataflow accelerator semiconductor circuit is disclosed. The data flow accelerator includes a DRAM bank and a peripheral array of multiply-and-add units disposed adjacent to the DRAM bank. The peripheral array of multiply-and-add units are configured to form a pipelined dataflow chain in which partial output data from one multiply-and-add unit from among the array of multiply-and-add units is fed into another multiply-and-add unit from among the array of multiply-and-add units for data accumulation. Near-DRAM-processing dataflow (NDP-DF) accelerator unit dies may be stacked atop a base die. The base die may be disposed on a passive silicon interposer adjacent to a processor or a controller. The NDP-DF accelerator units may process partial matrix output data in parallel. The partial matrix output data may be propagated in a forward or backward direction. The tensor computation dataflow accelerator may perform a partial matrix transposition.
-
公开(公告)号:US20220367412A1
公开(公告)日:2022-11-17
申请号:US17873120
申请日:2022-07-25
Applicant: Samsung Electronics Co., Ltd.
Inventor: Peng GU , Krishna MALLADI , Hongzhong ZHENG
IPC: H01L25/065 , H01L31/12 , H01L31/02 , H01L31/0232 , H01L25/18 , H04B10/80 , H04Q11/00 , G02F1/01
Abstract: According to one general aspect, an apparatus may include a memory circuit die configured to store a lookup table that converts first data to second data. The apparatus may also include a logic circuit die comprising combinatorial logic circuits configured to receive the second data. The apparatus may further include an optical via coupled between the memory circuit die and the logical circuit die and configured to transfer second data between the memory circuit die and the logic circuit die.
-
5.
公开(公告)号:US20200184001A1
公开(公告)日:2020-06-11
申请号:US16388860
申请日:2019-04-18
Applicant: Samsung Electronics Co., Ltd.
Inventor: Peng GU , Krishna MALLADI , Hongzhong ZHENG , Dimin NIU
IPC: G06F17/16 , G06F12/0877
Abstract: A general matrix-matrix multiplication (GEMM) dataflow accelerator circuit is disclosed that includes a smart 3D stacking DRAM architecture. The accelerator circuit includes a memory bank, a peripheral lookup table stored in the memory bank, and a first vector buffer to store a first vector that is used as a row address into the lookup table. The circuit includes a second vector buffer to store a second vector that is used as a column address into the lookup table, and lookup table buffers to receive and store lookup table entries from the lookup table. The circuit further includes adders to sum the first product and a second product, and an output buffer to store the sum. The lookup table buffers determine a product of the first vector and the second vector without performing a multiply operation. The embodiments include a hierarchical lookup architecture to reduce latency. Accumulation results are propagated in a systolic manner.
-
-
-
-