-
公开(公告)号:US12099453B2
公开(公告)日:2024-09-24
申请号:US17709031
申请日:2022-03-30
Applicant: NVIDIA Corporation
Inventor: William James Dally , Carl Thomas Gray , Stephen W. Keckler , James Michael O'Connor
IPC: G06F13/16 , G11C8/12 , H03K19/1776
CPC classification number: G06F13/161 , G06F13/1673 , G06F13/1689 , G11C8/12 , H03K19/1776
Abstract: Embodiments of the present disclosure relate to application partitioning for locality in a stacked memory system. In an embodiment, one or more memory dies are stacked on the processor die. The processor die includes multiple processing tiles and each memory die includes multiple memory tiles. Vertically aligned memory tiles are directly coupled to and comprise the local memory block for a corresponding processing tile. An application program that operates on dense multi-dimensional arrays (matrices) may partition the dense arrays into sub-arrays associated with program tiles. Each program tile is executed by a processing tile using the processing tile's local memory block to process the associated sub-array. Data associated with each sub-array is stored in a local memory block and the processing tile corresponding to the local memory block executes the program tile to process the sub-array data.
-
公开(公告)号:US20230297269A1
公开(公告)日:2023-09-21
申请号:US17683292
申请日:2022-02-28
Applicant: NVIDIA Corporation
Inventor: William James Dally , Carl Thomas Gray , Stephen W. Keckler , James Michael O’Connor
IPC: G06F3/06
CPC classification number: G06F3/0655 , G06F3/0604 , G06F3/0679
Abstract: A hierarchical network enables access for a stacked memory system including or more memory dies that each include multiple memory tiles. The processor die includes multiple processing tiles that are stacked with the one or more memory die. The memory tiles that are vertically aligned with a processing tile are directly coupled to the processing tile and comprise the local memory block for the processing tile. The hierarchical network provides access paths for each processing tile to access the processing tile’s local memory block, the local memory block coupled to a different processing tile within the same processing die, memory tiles in a different die stack, and memory tiles in a different device. The ratio of memory bandwidth (byte) to floating-point operation (B:F) may improve 50x for accessing the local memory block compared with conventional memory. Additionally, the energy consumed to transfer each bit may be reduced by 10x.
-
公开(公告)号:US20230275068A1
公开(公告)日:2023-08-31
申请号:US17683290
申请日:2022-02-28
Applicant: NVIDIA Corporation
Inventor: William James Dally , Carl Thomas Gray , Stephen W. Keckler , James Michael O'Connor
IPC: H01L25/065
CPC classification number: H01L25/0657 , H01L2225/06565 , H01L27/11517
Abstract: Embodiments of the present disclosure relate to memory stacked on processor for high bandwidth. Systems and methods are disclosed for providing a one-level memory for a processing system by stacking bulk memory on a processor die. In an embodiment, one or more memory dies are stacked on the processor die. The processor die includes multiple processing tiles, where each tile includes a processing unit, mapper, and tile network. Each memory die includes multiple memory tiles. The processing tile is coupled to each memory tile that is above or below the processing tile. The vertically aligned memory tiles comprise the local memory block for the processing tile. The ratio of memory bandwidth (byte) to floating-point operation (B:F) may improve 50× for accessing the local memory block compared with conventional memory. Additionally, the energy consumed to transfer each bit may be reduced by 10×.
-
公开(公告)号:US10164638B2
公开(公告)日:2018-12-25
申请号:US15908242
申请日:2018-02-28
Applicant: NVIDIA Corporation
Inventor: John Michael Wilson , John W. Poulton , Matthew Rudolph Fojtik , Carl Thomas Gray
IPC: H03K19/0175 , H03K19/0185
Abstract: A balanced, charge-recycling repeater link is disclosed. The link includes a first set of segments operating in a first voltage domain and a second set of segments operating in a second voltage domain. The link is configured to transmit a first signal over at least one segment in the first set of segments and at least one other segment in the second set of segments. Each segment of the link includes at least one active circuit element configured to charge or discharge one or more corresponding interconnects within the link and a level shifter configured to shift the level of a signal on a last interconnect of the segment from the first voltage domain to the second voltage domain or the second voltage domain to the first voltage domain.
-
公开(公告)号:US09954527B2
公开(公告)日:2018-04-24
申请号:US14869759
申请日:2015-09-29
Applicant: NVIDIA Corporation
Inventor: John Michael Wilson , John W. Poulton , Matthew Rudolph Fojtik , Carl Thomas Gray
IPC: H03K19/0175 , H03K19/0185
CPC classification number: H03K19/018528
Abstract: A balanced, charge-recycling repeater link is disclosed. The link includes a first set of segments operating in a first voltage domain and a second set of segments operating in a second voltage domain. The link is configured to transmit a first signal over at least one segment in the first set of segments and at least one other segment in the second set of segments. Each segment of the link includes at least one active circuit element configured to charge or discharge one or more corresponding interconnects within the link and a level shifter configured to shift the level of a signal on a last interconnect of the segment from the first voltage domain to the second voltage domain or the second voltage domain to the first voltage domain.
-
公开(公告)号:US20190173380A1
公开(公告)日:2019-06-06
申请号:US16248373
申请日:2019-01-15
Applicant: NVIDIA Corporation
Inventor: Sudhir Shrikantha Kudva , William J. Dally , Thomas Hastings Greer, III , Carl Thomas Gray
Abstract: A system and method are provided for controlling a modified buck converter circuit. A pull-up switching mechanism that is coupled to an upstream terminal of an inductor within a modified buck converter circuit is enabled. A load current at the output of the modified buck regulator circuit is measured. A capacitor current associated with a capacitor that is coupled to a downstream terminal of the inductor is continuously sensed and the pull-up switching mechanism is disabled when the capacitor current is greater than a sum of the load current and an enabling current value.
-
公开(公告)号:US20170279355A1
公开(公告)日:2017-09-28
申请号:US15080461
申请日:2016-03-24
Applicant: NVIDIA Corporation
Inventor: Sudhir Shrikantha Kudva , William J. Dally , Thomas Hastings Greer, III , Carl Thomas Gray
CPC classification number: H02M3/158 , H02M1/08 , H02M3/156 , H02M2001/0009 , H02M2001/0019 , H02M2003/1566
Abstract: A system and method are provided for controlling a modified buck converter circuit. A pull-up switching mechanism that is coupled to an upstream terminal of an inductor within a modified buck converter circuit is enabled. A load current at the output of the modified buck regulator circuit is measured. A capacitor current associated with a capacitor that is coupled to a downstream terminal of the inductor is continuously sensed and the pull-up switching mechanism is disabled when the capacitor current is greater than a sum of the load current and an enabling current value.
-
公开(公告)号:US12223201B2
公开(公告)日:2025-02-11
申请号:US18438139
申请日:2024-02-09
Applicant: NVIDIA Corporation
Inventor: William James Dally , Carl Thomas Gray , Stephen W. Keckler , James Michael O'Connor
IPC: G06F3/06
Abstract: A hierarchical network enables access for a stacked memory system including or more memory dies that each include multiple memory tiles. The processor die includes multiple processing tiles that are stacked with the one or more memory die. The memory tiles that are vertically aligned with a processing tile are directly coupled to the processing tile and comprise the local memory block for the processing tile. The hierarchical network provides access paths for each processing tile to access the processing tile's local memory block, the local memory block coupled to a different processing tile within the same processing die, memory tiles in a different die stack, and memory tiles in a different device. The ratio of memory bandwidth (byte) to floating-point operation (B:F) may improve 50× for accessing the local memory block compared with conventional memory. Additionally, the energy consumed to transfer each bit may be reduced by 10×.
-
公开(公告)号:US20240211166A1
公开(公告)日:2024-06-27
申请号:US18438139
申请日:2024-02-09
Applicant: NVIDIA Corporation
Inventor: William James Dally , Carl Thomas Gray , Stephen W. Keckler , James Michael O'Connor
IPC: G06F3/06
CPC classification number: G06F3/0655 , G06F3/0604 , G06F3/0679
Abstract: A hierarchical network enables access for a stacked memory system including or more memory dies that each include multiple memory tiles. The processor die includes multiple processing tiles that are stacked with the one or more memory die. The memory tiles that are vertically aligned with a processing tile are directly coupled to the processing tile and comprise the local memory block for the processing tile. The hierarchical network provides access paths for each processing tile to access the processing tile's local memory block, the local memory block coupled to a different processing tile within the same processing die, memory tiles in a different die stack, and memory tiles in a different device. The ratio of memory bandwidth (byte) to floating-point operation (B:F) may improve 50× for accessing the local memory block compared with conventional memory. Additionally, the energy consumed to transfer each bit may be reduced by 10×.
-
公开(公告)号:US20180191349A1
公开(公告)日:2018-07-05
申请号:US15908242
申请日:2018-02-28
Applicant: NVIDIA Corporation
Inventor: John Michael Wilson , John W. Poulton , Matthew Rudolph Fojtik , Carl Thomas Gray
IPC: H03K19/0185
CPC classification number: H03K19/018528
Abstract: A balanced, charge-recycling repeater link is disclosed. The link includes a first set of segments operating in a first voltage domain and a second set of segments operating in a second voltage domain. The link is configured to transmit a first signal over at least one segment in the first set of segments and at least one other segment in the second set of segments. Each segment of the link includes at least one active circuit element configured to charge or discharge one or more corresponding interconnects within the link and a level shifter configured to shift the level of a signal on a last interconnect of the segment from the first voltage domain to the second voltage domain or the second voltage domain to the first voltage domain.
-
-
-
-
-
-
-
-
-