Patent search ap:("NVIDIA CORPORATION") AND inv:"David Nellans" Page 1

1.

发明授权
Hardware support for optimizing huge memory page selection 有权

公开(公告)号：US12130750B2

公开(公告)日：2024-10-29

申请号：US18118020

申请日：2023-03-06

Applicant: NVIDIA Corporation

Inventor： Aninda Manocha , Zi Yan , David Nellans

IPC: G06F12/1027

CPC classification number: G06F12/1027

Abstract: Computer systems often employ virtual address translation hierarchies in which virtual memory addresses are mapped to physical memory. Use of the virtual address translation hierarchy speeds up the virtual address translation when the required mapping is stored in one of the higher levels of the hierarchy. To reduce a number of misses occurring in the virtual address translation hierarchy, huge memory pages may be selectively employed, which map larger continuous regions of virtual memory to continuous regions of physical memory, thereby increasing the coverage of each entry in the virtual address translation hierarchy. The present disclosure provides hardware support for optimizing this huge memory page selection.

2.

发明授权
Techniques for configuring parallel processors for different application domains 有权

公开(公告)号：US11609879B2

公开(公告)日：2023-03-21

申请号：US17365315

申请日：2021-07-01

Applicant: NVIDIA CORPORATION

Inventor： Yaosheng Fu , Evgeny Bolotin , Niladrish Chatterjee , Stephen William Keckler , David Nellans

IPC: G06F15/78 , G06F12/0811 , G06F12/12 , G06F13/40

Abstract: In various embodiments, a parallel processor includes a parallel processor module implemented within a first die and a memory system module implemented within a second die. The memory system module is coupled to the parallel processor module via an on-package link. The parallel processor module includes multiple processor cores and multiple cache memories. The memory system module includes a memory controller for accessing a DRAM. Advantageously, the performance of the parallel processor module can be effectively tailored for memory bandwidth demands that typify one or more application domains via the memory system module.

3.

发明申请
AUTOMATIC METHOD FOR POWER MANAGEMENT TUNING IN COMPUTING SYSTEMS 有权

公开(公告)号：US20230079978A1

公开(公告)日：2023-03-16

申请号：US17709720

申请日：2022-03-31

Applicant: Nvidia Corporation

Inventor： Evgeny Bolotin , Yaosheng Fu , Zi Yan , Gal Dalal , Shie Mannor , David Nellans

IPC: G06F1/324 , G06F1/20 , G06F11/34

Abstract: A system, method, and apparatus of power management for computing systems are included herein that optimize individual frequencies of components of the computing systems using machine learning. The computing systems can be tightly integrated systems that consider an overall operating budget that is shared between the components of the computing system while adjusting the frequencies of the individual components. An example of an automated method of power management includes: (1) learning, using a power management (PM) agent, frequency settings for different components of a computing system during execution of a repetitive application, and (2) adjusting the frequency settings of the different components using the PM agent, wherein the adjusting is based on the repetitive application and one or more limitations corresponding to a shared operating budget for the computing system.

4.

发明公开
HARDWARE SUPPORT FOR OPTIMIZING HUGE MEMORY PAGE SELECTION 审中-公开

公开(公告)号：US20240303201A1

公开(公告)日：2024-09-12

申请号：US18118020

申请日：2023-03-06

Applicant: NVIDIA Corporation

Inventor： Aninda Manocha , Zi Yan , David Nellans

IPC: G06F12/1027

CPC classification number: G06F12/1027

Abstract: Computer systems often employ virtual address translation hierarchies in which virtual memory addresses are mapped to physical memory. Use of the virtual address translation hierarchy speeds up the virtual address translation when the required mapping is stored in one of the higher levels of the hierarchy. To reduce a number of misses occurring in the virtual address translation hierarchy, huge memory pages may be selectively employed, which map larger continuous regions of virtual memory to continuous regions of physical memory, thereby increasing the coverage of each entry in the virtual address translation hierarchy. The present disclosure provides hardware support for optimizing this huge memory page selection.

5.

发明授权
Read-write page replication for multiple compute units 有权

公开(公告)号：US11625279B2

公开(公告)日：2023-04-11

申请号：US16787967

申请日：2020-02-11

Applicant: NVIDIA Corporation

Inventor： Daniel Lustig , Oreste Villa , David Nellans

IPC: G06F9/50 , G06F11/30 , G06F9/54 , G06F12/1027 , G06F11/07 , G06F12/0882

Abstract: In general, an application executes on a compute unit, such as a central processing unit (CPU) or graphics processing unit (GPU), to perform some function(s). In some circumstances, improved performance of an application, such as a graphics application, may be provided by executing the application across multiple compute units. However, when using multiple compute units in this manner, synchronization must be provided between the compute units. Synchronization, including the sharing of the data, is typically accomplished through memory. While a shared memory may cause bottlenecks, employing local memory for each compute unit may itself require synchronization (coherence) which can be costly in terms of resources, delay, etc. The present disclosure provides read-write page replication for multiple compute units that avoids the traditional challenges associated with coherence.

6.

发明授权
Automatic method for power management tuning in computing systems 有权

公开(公告)号：US11880261B2

公开(公告)日：2024-01-23

申请号：US17709720

申请日：2022-03-31

Applicant: Nvidia Corporation

Inventor： Evgeny Bolotin , Yaosheng Fu , Zi Yan , Gal Dalal , Shie Mannor , David Nellans

IPC: G06F1/324 , G06F11/34 , G06F1/20

CPC classification number: G06F1/324 , G06F1/206 , G06F11/3495

Abstract: A system, method, and apparatus of power management for computing systems are included herein that optimize individual frequencies of components of the computing systems using machine learning. The computing systems can be tightly integrated systems that consider an overall operating budget that is shared between the components of the computing system while adjusting the frequencies of the individual components. An example of an automated method of power management includes: (1) learning, using a power management (PM) agent, frequency settings for different components of a computing system during execution of a repetitive application, and (2) adjusting the frequency settings of the different components using the PM agent, wherein the adjusting is based on the repetitive application and one or more limitations corresponding to a shared operating budget for the computing system.

7.

发明申请
TECHNIQUE FOR AUTONOMOUSLY MANAGING CACHE USING MACHINE LEARNING 有权

公开(公告)号：US20230137205A1

公开(公告)日：2023-05-04

申请号：US17514735

申请日：2021-10-29

Applicant: Nvidia Corporation

Inventor： Yaosheng Fu , Shie Mannor , Evgeny Bolotin , David Nellans , Gal Dalal

IPC: G06F12/123 , G06N20/00 , G06T1/60

Abstract: Introduced herein is a technique that uses ML to autonomously find a cache management policy that achieves an optimal execution of a given workload of an application. Leveraging ML such as reinforcement learning, the technique trains an agent in an ML environment over multiple episodes of a stabilization process. For each time step in these training episodes, the agent executes the application while making an incremental change to the current policy, i.e., cache-residency statuses of memory address space associated with the workload, until the application can be executed at a stable level. The stable level of execution, for example, can be indicated by performance variations, such as standard deviations, between a certain number of neighboring measurement periods remaining within a certain threshold. The agent, who has been trained in the training episodes, infers the final cache management policy during the final, inferring episode.

8.

发明申请
READ-WRITE PAGE REPLICATION FOR MULTIPLE COMPUTE UNITS 有权

公开(公告)号：US20210248014A1

公开(公告)日：2021-08-12

申请号：US16787967

申请日：2020-02-11

Applicant: NVIDIA Corporation

Inventor： Daniel Lustig , Oreste Villa , David Nellans

IPC: G06F9/50 , G06F9/54 , G06F12/0882 , G06F12/1027 , G06F11/07 , G06F11/30

Abstract: In general, an application executes on a compute unit, such as a central processing unit (CPU) or graphics processing unit (GPU), to perform some function(s). In some circumstances, improved performance of an application, such as a graphics application, may be provided by executing the application across multiple compute units. However, when using multiple compute units in this manner, synchronization must be provided between the compute units. Synchronization, including the sharing of the data, is typically accomplished through memory. While a shared memory may cause bottlenecks, employing local memory for each compute unit may itself require synchronization (coherence) which can be costly in terms of resources, delay, etc. The present disclosure provides read-write page replication for multiple compute units that avoids the traditional challenges associated with coherence.

Search Results

Country/Region

Patent validity

Application date

Publication (announcement) day

applicant

The country/region where the applicant is located

Inventor

IPC

IPC Department

IPC class

IPC subclass

IPC group

IPC team

Appearance classification