Patent search ap:("Nvidia Corporation") AND inv:"Brian Fahs" Page 1

1.

发明申请
USER-DEFINED METERED PRIORITY QUEUES 有权

公开(公告)号：US20210191754A1

公开(公告)日：2021-06-24

申请号：US16722422

申请日：2019-12-20

Applicant: Nvidia Corporation

Inventor： Jonathon Evans , Lacky Shah , Phil Johnson , Jonah Alben , Brian Pharris , Greg Palmer , Brian Fahs

IPC: G06F9/48 , G06N3/08

Abstract: Apparatuses, systems, and techniques to optimize processor resources at a user-defined level. In at least one embodiment, priority of one or more tasks are adjusted to prevent one or more other dependent tasks from entering an idle state due to lack of resources to consume.

2.

发明授权
Access tracking mechanism for hybrid memories in a unified virtual system 有权

公开(公告)号：US09830262B2

公开(公告)日：2017-11-28

申请号：US14133488

申请日：2013-12-18

Applicant: NVIDIA CORPORATION

Inventor： Jerome F. Duluk, Jr. , Cameron Buschardt , James Leroy Deming , Brian Fahs

IPC: G06F12/08 , G06F11/30 , G06F11/34

CPC classification number: G06F12/08 , G06F11/3037 , G06F11/3442 , G06F11/3471 , G06F2201/81 , G06F2201/815 , G06F2201/88 , G06F2212/205

Abstract: Embodiments of the approaches disclosed herein include a subsystem that includes an access tracking mechanism configured to monitor access operations directed to a first memory and a second memory. The access tracking mechanism detects an access operation generated by a processor for accessing a first memory page residing on the second memory. The access tracking mechanism further determines that the first memory page is included in a first subset of memory pages residing on the second memory. The access tracking mechanism further locates, within a reference vector, a reference bit that corresponds to the first memory page, and sets the reference bit. One advantage of the present invention is that memory pages in a hybrid system migrate as needed to increase overall memory performance.

3.

发明授权
Cooperative thread array reduction and scan operations 有权

公开(公告)号：US09417875B2

公开(公告)日：2016-08-16

申请号：US14025482

申请日：2013-09-12

Applicant: NVIDIA Corporation

Inventor： Brian Fahs , Ming Y. Siu , Brett W. Coon , John R. Nickolls , Lars Nyland

IPC: G06F9/30 , G06F15/00 , G06F9/38 , G06F9/52

CPC classification number: G06F9/522 , G06F8/458 , G06F9/3004 , G06F9/30087 , G06F9/30145 , G06F9/3851

Abstract: One embodiment of the present invention sets forth a technique for performing aggregation operations across multiple threads that execute independently. Aggregation is specified as part of a barrier synchronization or barrier arrival instruction, where in addition to performing the barrier synchronization or arrival, the instruction aggregates (using reduction or scan operations) values supplied by each thread. When a thread executes the barrier aggregation instruction the thread contributes to a scan or reduction result, and waits to execute any more instructions until after all of the threads have executed the barrier aggregation instruction. A reduction result is communicated to each thread after all of the threads have executed the barrier aggregation instruction and a scan result is communicated to each thread as the barrier aggregation instruction is executed by the thread.

4.

发明申请
TECHNIQUE FOR PERFORMING MEMORY ACCESS OPERATIONS VIA TEXTURE HARDWARE 有权

公开(公告)号：US20140168245A1

公开(公告)日：2014-06-19

申请号：US13720745

申请日：2012-12-19

Applicant: NVIDIA CORPORATION

Inventor： Brian Fahs , Eric T. Anderson , Nick Barrow-Williams , Shirish Gadre , Joel James McCormack , Bryon S. Nordquist , Nirmal Raj Saxena , Lacky V. Shah

IPC: G06F13/14

CPC classification number: G06F13/14 , G06T1/20 , G06T1/60 , G06T15/005 , G06T2210/36

Abstract: A texture processing pipeline can be configured to service memory access requests that represent texture data access operations or generic data access operations. When the texture processing pipeline receives a memory access request that represents a texture data access operation, the texture processing pipeline may retrieve texture data based on texture coordinates. When the memory access request represents a generic data access operation, the texture pipeline extracts a virtual address from the memory access request and then retrieves data based on the virtual address. The texture processing pipeline is also configured to cache generic data retrieved on behalf of a group of threads and to then invalidate that generic data when the group of threads exits.

5.

发明授权
User-defined metered priority queues 有权

公开(公告)号：US11954518B2

公开(公告)日：2024-04-09

申请号：US16722422

申请日：2019-12-20

Applicant: Nvidia Corporation

Inventor： Jonathon Evans , Lacky Shah , Phil Johnson , Jonah Alben , Brian Pharris , Greg Palmer , Brian Fahs

IPC: G06F9/48 , G06N3/08

CPC classification number: G06F9/4831 , G06N3/08

Abstract: Apparatuses, systems, and techniques to optimize processor resources at a user-defined level. In at least one embodiment, priority of one or more tasks are adjusted to prevent one or more other dependent tasks from entering an idle state due to lack of resources to consume.

6.

发明授权
Frame buffer access tracking via a sliding window in a unified virtual memory system 有权

公开(公告)号：US10061526B2

公开(公告)日：2018-08-28

申请号：US15169532

申请日：2016-05-31

Applicant: NVIDIA Corporation

Inventor： John Mashey , Cameron Buschardt , James Leroy Deming , Jerome F. Duluk, Jr. , Brian Fahs

IPC: G06F3/06 , G06F12/1027 , G06F12/1009

CPC classification number: G06F3/0622 , G06F3/0631 , G06F3/0647 , G06F3/0685 , G06F12/1009 , G06F12/1027 , G06F2212/656 , G06F2212/684

Abstract: One embodiment of the present invention is a memory subsystem that includes a sliding window tracker that tracks memory accesses associated with a sliding window of memory page groups. When the sliding window tracker detects an access operation associated with a memory page group within the sliding window, the sliding window tracker sets a reference bit that is associated with the memory page group and is included in a reference vector that represents accesses to the memory page groups within the sliding window. Based on the values of the reference bits, the sliding window tracker causes the selection a memory page in a memory page group that has fallen into disuse from a first memory to a second memory. Because the sliding window tracker tunes the memory pages that are resident in the first memory to reflect memory access patterns, the overall performance of the memory subsystem is improved.

7.

发明授权
Replaying memory transactions while resolving memory access faults 有权

公开(公告)号：US09830276B2

公开(公告)日：2017-11-28

申请号：US15437400

申请日：2017-02-20

Applicant: NVIDIA Corporation

Inventor： James Leroy Deming , Jerome F. Duluk, Jr. , John Mashey , Mark Hairgrove , Lucien Dunning , Jonathon Stuart Ramsey Evans , Samuel H. Duncan , Cameron Buschardt , Brian Fahs

IPC: G06F12/08 , G06F9/46 , G06F12/1027

CPC classification number: G06F12/1027 , G06F9/467 , G06F12/08 , G06F2212/301 , G06F2212/684

Abstract: One embodiment of the present invention is a parallel processing unit (PPU) that includes one or more streaming multiprocessors (SMs) and implements a replay unit per SM. Upon detecting a page fault associated with a memory transaction issued by a particular SM, the corresponding replay unit causes the SM, but not any unaffected SMs, to cease issuing new memory transactions. The replay unit then stores the faulting memory transaction and any faulting in-flight memory transaction in a replay buffer. As page faults are resolved, the replay unit replays the memory transactions in the replay buffer—removing successful memory transactions from the replay buffer—until all of the stored memory transactions have successfully executed. Advantageously, the overall performance of the PPU is improved compared to conventional PPUs that, upon detecting a page fault, stop performing memory transactions across all SMs included in the PPU until the fault is resolved.

8.

发明授权
Migration of peer-mapped memory pages 有权

公开(公告)号：US09639474B2

公开(公告)日：2017-05-02

申请号：US14134148

申请日：2013-12-19

Applicant: NVIDIA CORPORATION

Inventor： Jerome F. Duluk, Jr. , John Mashey , Mark Hairgrove , Chenghuan Jia , Cameron Buschardt , Lucien Dunning , Brian Fahs

IPC: G06F13/00 , G06F12/1009 , G06F12/0804

CPC classification number: G06F3/0604 , G06F3/0647 , G06F3/0664 , G06F12/0804 , G06F12/1009 , G06F13/4022 , G06F13/4282 , G06F2212/657

Abstract: Techniques are provided by which memory pages may be migrated among PPU memories in a multi-PPU system. According to the techniques, a UVM driver determines that a particular memory page should change ownership state and/or be migrated between one PPU memory and another PPU memory. In response to this determination, the UVM driver initiates a peer transition sequence to cause the ownership state and/or location of the memory page to change. Various peer transition sequences involve modifying mappings for one or more PPU, and copying a memory page from one PPU memory to another PPU memory. Several steps in peer transition sequences may be performed in parallel for increased processing speed.

9.

发明授权
Microcontroller for memory management unit 有权
Title translation: 内存管理单元微控制器

公开(公告)号：US09588903B2

公开(公告)日：2017-03-07

申请号：US14011655

申请日：2013-08-27

Applicant: NVIDIA CORPORATION

Inventor： Cameron Buschardt , Jerome F. Duluk, Jr. , John Mashey , Mark Hairgrove , James Leroy Deming , Brian Fahs

IPC: G06F12/00 , G06F12/10

Abstract: One embodiment of the present invention includes a microcontroller coupled to a memory management unit (MMU). The MMU is coupled to a page table included in a physical memory, and the microcontroller is configured to perform one or more virtual memory operations associated with the physical memory and the page table. In operation, the microcontroller receives a page fault generated by the MMU in response to an invalid memory access via a virtual memory address. To remedy such a page fault, the microcontroller performs actions to map the virtual memory address to an appropriate location in the physical memory. By contrast, in prior-art systems, a fault handler would typically remedy the page fault. Advantageously, because the microcontroller executes these tasks locally with respect to the MMU and the physical memory, latency associated with remedying page faults may be decreased. Consequently, overall system performance may be increased.

Abstract translation: 本发明的一个实施例包括耦合到存储器管理单元（MMU）的微控制器。 MMU耦合到包括在物理存储器中的页表，并且微控制器被配置为执行与物理存储器和页表相关联的一个或多个虚拟存储器操作。在操作中，微控制器响应于通过虚拟存储器地址的无效存储器访问而接收由MMU产生的页面错误。为了纠正这种页面错误，微控制器执行操作以将虚拟存储器地址映射到物理存储器中的适当位置。相比之下，在现有技术的系统中，故障处理器通常会补救页面错误。有利地，由于微控制器相对于MMU和物理存储器在本地执行这些任务，所以与补救页错误相关联的延迟可能会降低。因此，整体系统性能可能会增加。

10.

发明申请
DEEP LEARNING THREAD COMMUNICATION 审中-公开

公开(公告)号：US20200334076A1

公开(公告)日：2020-10-22

申请号：US16389548

申请日：2019-04-19

Applicant: Nvidia Corporation

Inventor： Brian Fahs , Michael Lightstone , Mostafa Hagog

IPC: G06F9/48 , G06N3/08 , G06F9/54 , G06T1/20

Abstract: An application binary interface (ABI) can be exposed in a processor to enable blocks of threads, which may correspond to separately compiled operators, to communicate without storing data to global memory external to the processor. The ABI can define how results of one computation, corresponding to a first thread block, will be organized in registers and shared memory of a processor at the end of one operator (i.e., kernel). The start of the next operator (i.e., kernel), corresponding to a second thread block, can consume the results from the registers and shared memory. Data can be stored to processor local storage for individual threads as they exit the block. Once published, libraries can be separately compiled, optimized, and tested as long as they adhere to the published ABI.

Search Results

Country/Region

Patent validity

Application date

Publication (announcement) day

applicant

The country/region where the applicant is located

Inventor

IPC

IPC Department

IPC class

IPC subclass

IPC group

IPC team

Appearance classification