Patent search ap:("NVIDIA Corporation") AND inv:"Stephen W. Keckler" Page 1

1.

发明申请
AUGMENTING AND DYNAMICALLY CONFIGURING A NEURAL NETWORK MODEL FOR REAL-TIME SYSTEMS 有权

公开(公告)号：US20230111375A1

公开(公告)日：2023-04-13

申请号：US17724819

申请日：2022-04-20

Applicant: NVIDIA Corporation

Inventor： Jason Lavar Clemons , Kavya Sreedhar , Stephen W. Keckler

IPC: G06N3/08 , G06N3/04 , G06F11/34

Abstract: A neural network model is augmented for dynamic configuration and execution in real-time according to performance constraints. In an embodiment, the neural network model is a transformer neural network model. The performance constraints may include a metric, such as inferencing execution time or energy consumption and a target value for the metric. The augmented neural network model is characterized for various configurations and settings are determined corresponding to a variety of the performance constraints. One or more performance constraints may be provided as an input to dynamically select a configuration of the augmented neural network model. Through dynamic configuration, the augmented neural network model may adapt to real-time changes in the performance constraints. However, the trained weights for an original (before augmentation) neural network model may be used by the augmented neural network model without modification.

2.

发明申请
PROCESSOR AND MEMORY COMMUNICATION IN A STACKED MEMORY SYSTEM 有权

公开(公告)号：US20240411709A1

公开(公告)日：2024-12-12

申请号：US18810657

申请日：2024-08-21

Applicant: NVIDIA Corporation

Inventor： William James Dally , Carl Thomas Gray , Stephen W. Keckler , James Michael O'Connor

IPC: G06F13/16 , G11C8/12 , H03K19/1776

Abstract: Embodiments of the present disclosure relate to application partitioning for locality in a stacked memory system. In an embodiment, one or more memory dies are stacked on the processor die. The processor die includes multiple processing tiles and each memory die includes multiple memory tiles. Vertically aligned memory tiles are directly coupled to and comprise the local memory block for a corresponding processing tile. An application program that operates on dense multi-dimensional arrays (matrices) may partition the dense arrays into sub-arrays associated with program tiles. Each program tile is executed by a processing tile using the processing tile's local memory block to process the associated sub-array. Data associated with each sub-array is stored in a local memory block and the processing tile corresponding to the local memory block executes the program tile to process the sub-array data.

3.

发明公开
APPLICATION PARTITIONING FOR LOCALITY IN A STACKED MEMORY SYSTEM 审中-公开

公开(公告)号：US20230315651A1

公开(公告)日：2023-10-05

申请号：US17709031

申请日：2022-03-30

Applicant: NVIDIA Corporation

Inventor： William James Dally , Carl Thomas Gray , Stephen W. Keckler , James Michael O'Connor

IPC: G06F13/16 , H03K19/1776 , G11C8/12

CPC classification number: G06F13/161 , G06F13/1689 , G06F13/1673 , H03K19/1776 , G11C8/12

Abstract: Embodiments of the present disclosure relate to application partitioning for locality in a stacked memory system. In an embodiment, one or more memory dies are stacked on the processor die. The processor die includes multiple processing tiles and each memory die includes multiple memory tiles. Vertically aligned memory tiles are directly coupled to and comprise the local memory block for a corresponding processing tile. An application program that operates on dense multi-dimensional arrays (matrices) may partition the dense arrays into sub-arrays associated with program tiles. Each program tile is executed by a processing tile using the processing tile's local memory block to process the associated sub-array. Data associated with each sub-array is stored in a local memory block and the processing tile corresponding to the local memory block executes the program tile to process the sub-array data.

4.

发明授权
Hierarchical network for stacked memory system 有权

公开(公告)号：US11977766B2

公开(公告)日：2024-05-07

申请号：US17683292

申请日：2022-02-28

Applicant: NVIDIA Corporation

Inventor： William James Dally , Carl Thomas Gray , Stephen W. Keckler , James Michael O'Connor

IPC: G06F3/06

CPC classification number: G06F3/0655 , G06F3/0604 , G06F3/0679

Abstract: A hierarchical network enables access for a stacked memory system including or more memory dies that each include multiple memory tiles. The processor die includes multiple processing tiles that are stacked with the one or more memory die. The memory tiles that are vertically aligned with a processing tile are directly coupled to the processing tile and comprise the local memory block for the processing tile. The hierarchical network provides access paths for each processing tile to access the processing tile's local memory block, the local memory block coupled to a different processing tile within the same processing die, memory tiles in a different die stack, and memory tiles in a different device. The ratio of memory bandwidth (byte) to floating-point operation (B:F) may improve 50× for accessing the local memory block compared with conventional memory. Additionally, the energy consumed to transfer each bit may be reduced by 10×.

5.

发明授权
Hierarchical network for stacked memory system 有权

公开(公告)号：US12223201B2

公开(公告)日：2025-02-11

申请号：US18438139

申请日：2024-02-09

Applicant: NVIDIA Corporation

Inventor： William James Dally , Carl Thomas Gray , Stephen W. Keckler , James Michael O'Connor

IPC: G06F3/06

Abstract: A hierarchical network enables access for a stacked memory system including or more memory dies that each include multiple memory tiles. The processor die includes multiple processing tiles that are stacked with the one or more memory die. The memory tiles that are vertically aligned with a processing tile are directly coupled to the processing tile and comprise the local memory block for the processing tile. The hierarchical network provides access paths for each processing tile to access the processing tile's local memory block, the local memory block coupled to a different processing tile within the same processing die, memory tiles in a different die stack, and memory tiles in a different device. The ratio of memory bandwidth (byte) to floating-point operation (B:F) may improve 50× for accessing the local memory block compared with conventional memory. Additionally, the energy consumed to transfer each bit may be reduced by 10×.

6.

发明公开
HIERARCHICAL NETWORK FOR STACKED MEMORY SYSTEM 审中-公开

公开(公告)号：US20240211166A1

公开(公告)日：2024-06-27

申请号：US18438139

申请日：2024-02-09

Applicant: NVIDIA Corporation

Inventor： William James Dally , Carl Thomas Gray , Stephen W. Keckler , James Michael O'Connor

IPC: G06F3/06

CPC classification number: G06F3/0655 , G06F3/0604 , G06F3/0679

Abstract: A hierarchical network enables access for a stacked memory system including or more memory dies that each include multiple memory tiles. The processor die includes multiple processing tiles that are stacked with the one or more memory die. The memory tiles that are vertically aligned with a processing tile are directly coupled to the processing tile and comprise the local memory block for the processing tile. The hierarchical network provides access paths for each processing tile to access the processing tile's local memory block, the local memory block coupled to a different processing tile within the same processing die, memory tiles in a different die stack, and memory tiles in a different device. The ratio of memory bandwidth (byte) to floating-point operation (B:F) may improve 50× for accessing the local memory block compared with conventional memory. Additionally, the energy consumed to transfer each bit may be reduced by 10×.

7.

发明授权
Application partitioning for locality in a stacked memory system 有权

公开(公告)号：US12099453B2

公开(公告)日：2024-09-24

申请号：US17709031

申请日：2022-03-30

Applicant: NVIDIA Corporation

Inventor： William James Dally , Carl Thomas Gray , Stephen W. Keckler , James Michael O'Connor

IPC: G06F13/16 , G11C8/12 , H03K19/1776

CPC classification number: G06F13/161 , G06F13/1673 , G06F13/1689 , G11C8/12 , H03K19/1776

Abstract: Embodiments of the present disclosure relate to application partitioning for locality in a stacked memory system. In an embodiment, one or more memory dies are stacked on the processor die. The processor die includes multiple processing tiles and each memory die includes multiple memory tiles. Vertically aligned memory tiles are directly coupled to and comprise the local memory block for a corresponding processing tile. An application program that operates on dense multi-dimensional arrays (matrices) may partition the dense arrays into sub-arrays associated with program tiles. Each program tile is executed by a processing tile using the processing tile's local memory block to process the associated sub-array. Data associated with each sub-array is stored in a local memory block and the processing tile corresponding to the local memory block executes the program tile to process the sub-array data.

8.

发明公开
AUGMENTING LEGACY NEURAL NETWORKS FOR FLEXIBLE INFERENCE 审中-公开

公开(公告)号：US20230325670A1

公开(公告)日：2023-10-12

申请号：US17820780

申请日：2022-08-18

Applicant: NVIDIA Corporation

Inventor： Jason Lavar Clemons , Stephen W. Keckler , Iuri Frosio , Jose Manuel Alvarez Lopez , Maying Shen

IPC: G06N3/08

CPC classification number: G06N3/082

Abstract: A technique for dynamically configuring and executing an augmented neural network in real-time according to performance constraints also maintains the legacy neural network execution path. A neural network model that has been trained for a task is augmented with low-compute “shallow” phases paired with each legacy phase and the legacy phases of the neural network model are held constant (e.g., unchanged) while the shallow phases are trained. During inference, one or more of the shallow phases can be selectively executed in place of the corresponding legacy phase. Compared with the legacy phases, the shallow phases are typically less accurate, but have reduced latency and consume less power. Therefore, processing using one or more of the shallow phases in place of one or more of the legacy phases enables the augmented neural network to dynamically adapt to changes in the execution environment (e.g., processing load or performance requirement).

9.

发明公开
HIERARCHICAL NETWORK FOR STACKED MEMORY SYSTEM 审中-公开

公开(公告)号：US20230297269A1

公开(公告)日：2023-09-21

申请号：US17683292

申请日：2022-02-28

Applicant: NVIDIA Corporation

Inventor： William James Dally , Carl Thomas Gray , Stephen W. Keckler , James Michael O’Connor

IPC: G06F3/06

CPC classification number: G06F3/0655 , G06F3/0604 , G06F3/0679

Abstract: A hierarchical network enables access for a stacked memory system including or more memory dies that each include multiple memory tiles. The processor die includes multiple processing tiles that are stacked with the one or more memory die. The memory tiles that are vertically aligned with a processing tile are directly coupled to the processing tile and comprise the local memory block for the processing tile. The hierarchical network provides access paths for each processing tile to access the processing tile’s local memory block, the local memory block coupled to a different processing tile within the same processing die, memory tiles in a different die stack, and memory tiles in a different device. The ratio of memory bandwidth (byte) to floating-point operation (B:F) may improve 50x for accessing the local memory block compared with conventional memory. Additionally, the energy consumed to transfer each bit may be reduced by 10x.

10.

发明公开
MEMORY STACKED ON PROCESSOR FOR HIGH BANDWIDTH 审中-公开

公开(公告)号：US20230275068A1

公开(公告)日：2023-08-31

申请号：US17683290

申请日：2022-02-28

Applicant: NVIDIA Corporation

Inventor： William James Dally , Carl Thomas Gray , Stephen W. Keckler , James Michael O'Connor

IPC: H01L25/065

CPC classification number: H01L25/0657 , H01L2225/06565 , H01L27/11517

Abstract: Embodiments of the present disclosure relate to memory stacked on processor for high bandwidth. Systems and methods are disclosed for providing a one-level memory for a processing system by stacking bulk memory on a processor die. In an embodiment, one or more memory dies are stacked on the processor die. The processor die includes multiple processing tiles, where each tile includes a processing unit, mapper, and tile network. Each memory die includes multiple memory tiles. The processing tile is coupled to each memory tile that is above or below the processing tile. The vertically aligned memory tiles comprise the local memory block for the processing tile. The ratio of memory bandwidth (byte) to floating-point operation (B:F) may improve 50× for accessing the local memory block compared with conventional memory. Additionally, the energy consumed to transfer each bit may be reduced by 10×.

Search Results

Country/Region

Patent validity

Application date

Publication (announcement) day

applicant

The country/region where the applicant is located

Inventor

IPC

IPC Department

IPC class

IPC subclass

IPC group

IPC team

Appearance classification