Patent search ap:("NVIDIA CORPORATION") AND inv:"Jerome F. Duluk Page Jr."

41.

发明授权
Replaying memory transactions while resolving memory access faults 有权
Title translation: 在解决内存访问故障的同时重新记忆事务

公开(公告)号：US09575892B2

公开(公告)日：2017-02-21

申请号：US14109678

申请日：2013-12-17

Applicant: NVIDIA CORPORATION

Inventor： James Leroy Deming , Jerome F. Duluk, Jr. , John Mashey , Mark Hairgrove , Lucien Dunning , Jonathon Stuart Ramsey Evans , Samuel H. Duncan , Cameron Buschardt , Brian Fahs

IPC: G06F12/08 , G06F9/46

CPC classification number: G06F12/1027 , G06F9/467 , G06F12/08 , G06F2212/301 , G06F2212/684

Abstract: One embodiment of the present invention is a parallel processing unit (PPU) that includes one or more streaming multiprocessors (SMs) and implements a replay unit per SM. Upon detecting a page fault associated with a memory transaction issued by a particular SM, the corresponding replay unit causes the SM, but not any unaffected SMs, to cease issuing new memory transactions. The replay unit then stores the faulting memory transaction and any faulting in-flight memory transaction in a replay buffer. As page faults are resolved, the replay unit replays the memory transactions in the replay buffer—removing successful memory transactions from the replay buffer—until all of the stored memory transactions have successfully executed. Advantageously, the overall performance of the PPU is improved compared to conventional PPUs that, upon detecting a page fault, stop performing memory transactions across all SMs included in the PPU until the fault is resolved.

Abstract translation: 本发明的一个实施例是包括一个或多个流式多处理器（SM）并且实现每SM的重放单元的并行处理单元（PPU）。当检测到与由特定SM发出的存储器事务相关联的页面错误时，相应的重放单元使得SM，而不是任何未受影响的SM停止发行新的存储器事务。重播单元然后将故障存储器事务和任何故障的飞行中存储器事务存储在重放缓冲器中。当页面错误得到解决时，重播单元重播重播缓冲区中的内存事务，从重播缓冲区中移除成功的内存事务，直到所有存储的内存事务都已成功执行。有利的是，与常规PPU相比，PPU的整体性能得到改善，在常规PPU检测到页面故障之后，停止执行包含在PPU中的所有SM的存储器事务，直到故障被解决为止。

42.

发明授权
Migration directives in a unified virtual memory system architecture 有权
Title translation: 迁移指令在统一的虚拟内存系统架构中

公开(公告)号：US09430400B2

公开(公告)日：2016-08-30

申请号：US14109712

申请日：2013-12-17

Applicant: NVIDIA CORPORATION

Inventor： Jerome F. Duluk, Jr.

IPC: G06F12/00 , G06F12/10 , G06F12/08

CPC classification number: G06F12/1027 , G06F12/08 , G06F12/1009

Abstract: One embodiment of the present invention sets forth a computer-implemented method for altering migration rules for a unified virtual memory system. The method includes detecting that a migration rule trigger has been satisfied. The method also includes identifying a migration rule action that is associated with the migration rule trigger. The method further includes executing the migration rule action. Other embodiments of the present invention include a computer-readable medium, a computing device, and a unified virtual memory subsystem. One advantage of the disclosed approach is that various settings of the unified virtual memory system may be modified during program execution. This ability to alter the settings allows for an application to vary the manner in which memory pages are migrated and otherwise manipulated, which provides the application the ability to optimize the unified virtual memory system for efficient execution.

Abstract translation: 本发明的一个实施例提出了一种用于改变统一虚拟存储器系统的迁移规则的计算机实现的方法。该方法包括检测到已经满足迁移规则触发。该方法还包括标识与迁移规则触发相关联的迁移规则操作。该方法还包括执行迁移规则动作。本发明的其他实施例包括计算机可读介质，计算设备和统一虚拟存储器子系统。所公开的方法的一个优点是可以在程序执行期间修改统一虚拟存储器系统的各种设置。改变设置的这种能力允许应用程序改变内存页面被迁移和以其他方式处理的方式，这为应用程序提供了优化统一虚拟内存系统以有效执行的能力。

43.

发明授权
Migrating pages of different sizes between heterogeneous processors 有权
Title translation: 在异构处理器之间迁移不同大小的页面

公开(公告)号：US09424201B2

公开(公告)日：2016-08-23

申请号：US14134142

申请日：2013-12-19

Applicant: NVIDIA CORPORATION

Inventor： Jerome F. Duluk, Jr. , Cameron Buschardt , James Leroy Deming , Lucien Dunning , Brian Fahs , Mark Hairgrove , Chenghuan Jia , John Mashey , James M. Van Dyke

IPC: G06F12/00 , G06F13/00 , G06F12/10 , G06F12/08 , G06F12/12

CPC classification number: G06F3/0647 , G06F3/061 , G06F3/0655 , G06F3/0683 , G06F12/08 , G06F12/1009 , G06F12/122 , G06F2212/652

Abstract: One embodiment of the present invention sets forth a computer-implemented method for migrating a memory page from a first memory to a second memory. The method includes determining a first page size supported by the first memory. The method also includes determining a second page size supported by the second memory. The method further includes determining a use history of the memory page based on an entry in a page state directory associated with the memory page. The method also includes migrating the memory page between the first memory and the second memory based on the first page size, the second page size, and the use history.

Abstract translation: 本发明的一个实施例提出了一种用于将存储器页从第一存储器迁移到第二存储器的计算机实现的方法。该方法包括确定由第一存储器支持的第一页大小。该方法还包括确定由第二存储器支持的第二页大小。该方法还包括基于与存储器页相关联的页面状态目录中的条目来确定存储器页面的使用历史。该方法还包括基于第一页面大小，第二页面大小和使用历史来在第一存储器和第二存储器之间迁移存储器页面。

44.

发明授权
Managing per-tile event count reports in a tile-based architecture 有权
Title translation: 在基于瓦片的架构中管理每个瓦片事件计数报告

公开(公告)号：US09311097B2

公开(公告)日：2016-04-12

申请号：US14061409

申请日：2013-10-23

Applicant: NVIDIA Corporation

Inventor： Ziyad S. Hakura , Jerome F. Duluk, Jr.

IPC: G06F9/38 , G06T15/00 , G06T15/40 , G06T1/20 , G06T1/60 , G09G5/395 , G09G5/00 , G06T15/50 , G06F12/08 , G06T15/80 , G06F9/44

CPC classification number: G06T1/20 , G06F9/38 , G06F9/44 , G06F12/0808 , G06F12/0875 , G06F2212/302 , G06T1/60 , G06T15/005 , G06T15/405 , G06T15/503 , G06T15/80 , G06T17/20 , G09G5/003 , G09G5/395 , Y02D10/13

Abstract: A graphics processing system configured to track per-tile event counts in a tile-based architecture. A tiling unit in the graphics processing system is configured to cause a screen-space pipeline to load a count value associated with a first cache tile into a count memory and to cause the screen-space pipeline to process a first set of primitives that intersect the first cache tile. The tiling unit is further configured to cause the screen-space pipeline to store a second count value in a report memory location. The tiling unit is also configured to cause the screen-space pipeline to process a second set of primitives that intersect the first cache tile and to cause the screen-space pipeline to store a third count value in the first accumulating memory. Conditional rendering operations may be performed on a per-cache tile basis, based on the per-tile event count.

Abstract translation: 图形处理系统被配置为在基于瓦片的架构中跟踪每瓦片事件计数。图形处理系统中的平铺单元被配置为使屏幕空间管线将与第一高速缓存片相关联的计数值加载到计数存储器中，并且使屏幕空间管线处理与第一组图元相交的第一组图元第一个缓存平铺。平铺单元还被配置为使得屏幕空间管线将第二计数值存储在报告存储器位置中。平铺单元还被配置为使屏幕空间管线处理与第一高速缓存片相交的第二组图元，并且使屏幕空间管线在第一累积存储器中存储第三计数值。可以基于每个瓦片事件计数在每个高速缓存瓦片的基础上执行条件呈现操作。

45.

发明授权
Techniques for reconfiguring partitions in a parallel processing system 有权

公开(公告)号：US11579925B2

公开(公告)日：2023-02-14

申请号：US16562364

申请日：2019-09-05

Applicant: NVIDIA CORPORATION

Inventor： Jerome F. Duluk, Jr. , Gregory Scott Palmer , Jonathon Stuart Ramsey Evans , Shailendra Singh , Samuel H. Duncan , Wishwesh Anil Gandhi , Lacky V. Shah , Eric Rock , Feiqi Su , James Leroy Deming , Alan Menezes , Pranav Vaidya , Praveen Joginipally , Timothy John Purcell , Manas Mandal

IPC: G06F9/46 , G06F9/50

Abstract: A parallel processing unit (PPU) can be divided into partitions. Each partition is configured to operate similarly to how the entire PPU operates. A given partition includes a subset of the computational and memory resources associated with the entire PPU. Software that executes on a CPU partitions the PPU for an admin user. A guest user is assigned to a partition and can perform processing tasks within that partition in isolation from any other guest users assigned to any other partitions. Because the PPU can be divided into isolated partitions, multiple CPU processes can efficiently utilize PPU resources.

46.

发明授权
Techniques for configuring a processor to function as multiple, separate processors 有权

公开(公告)号：US11249905B2

公开(公告)日：2022-02-15

申请号：US16562361

申请日：2019-09-05

Applicant: NVIDIA CORPORATION

Inventor： Jerome F. Duluk, Jr. , Gregory Scott Palmer , Jonathon Stuart Ramsey Evans , Shailendra Singh , Samuel H. Duncan , Wishwesh Anil Gandhi , Lacky V. Shah , Eric Rock , Feiqi Su , James Leroy Deming , Alan Menezes , Pranav Vaidya , Praveen Joginipally , Timothy John Purcell , Manas Mandal

IPC: G06F12/06 , G06F9/50

Abstract: A parallel processing unit (PPU) can be divided into partitions. Each partition is configured to operate similarly to how the entire PPU operates. A given partition includes a subset of the computational and memory resources associated with the entire PPU. Software that executes on a CPU partitions the PPU for an admin user. A guest user is assigned to a partition and can perform processing tasks within that partition in isolation from any other guest users assigned to any other partitions. Because the PPU can be divided into isolated partitions, multiple CPU processes can efficiently utilize PPU resources.

47.

发明授权
PCIe traffic tracking hardware in a unified virtual memory system 有权

公开(公告)号：US11210253B2

公开(公告)日：2021-12-28

申请号：US16450830

申请日：2019-06-24

Applicant: NVIDIA CORPORATION

Inventor： Jerome F. Duluk, Jr. , Cameron Buschardt , James Leroy Deming , Brian Fahs , Mark Hairgrove , John Mashey

IPC: G06F13/40 , G06F12/123 , G06F12/0864 , G06F12/0806 , G06F12/0875

Abstract: Techniques are disclosed for tracking memory page accesses in a unified virtual memory system. An access tracking unit detects a memory page access generated by a first processor for accessing a memory page in a memory system of a second processor. The access tracking unit determines whether a cache memory includes an entry for the memory page. If so, then the access tracking unit increments an associated access counter. Otherwise, the access tracking unit attempts to find an unused entry in the cache memory that is available for allocation. If so, then the access tracking unit associates the second entry with the memory page, and sets an access counter associated with the second entry to an initial value. Otherwise, the access tracking unit selects a valid entry in the cache memory; clears an associated valid bit; associates the entry with the memory page; and initializes an associated access counter.

48.

发明授权
Dynamic partitioning of execution resources 有权

公开(公告)号：US10817338B2

公开(公告)日：2020-10-27

申请号：US15885761

申请日：2018-01-31

Applicant: NVIDIA Corporation

Inventor： Jerome F. Duluk, Jr. , Luke Durant , Ramon Matas Navarro , Alan Menezes , Jeffrey Tuckey , Gentaro Hirota , Brian Pharris

IPC: G06F9/50 , G06F12/02 , G06T1/60 , G06T1/20

Abstract: Embodiments of the present invention set forth techniques for allocating execution resources to groups of threads within a graphics processing unit. A compute work distributor included in the graphics processing unit receives an indication from a process that a first group of threads is to be launched. The compute work distributor determines that a first subcontext associated with the process has at least one processor credit. In some embodiments, CTAs may be launched even when there are no processor credits, if one of the TPCs that was already acquired has sufficient space. The compute work distributor identifies a first processor included in a plurality of processors that has a processing load that is less than or equal to the processor loads associated with all other processors included in the plurality of processors. The compute work distributor launches the first group of threads to execute on the first processor.

49.

发明申请
DYNAMIC PARTITIONING OF EXECUTION RESOURCES 审中-公开

公开(公告)号：US20190235924A1

公开(公告)日：2019-08-01

申请号：US15885761

申请日：2018-01-31

Applicant: NVIDIA Corporation

Inventor： Jerome F. Duluk, Jr. , Luke Durant , Ramon Matas Navarro , Alan Menezes , Jeffrey Tuckey , Gentaro Hirota , Brian Pharris

IPC: G06F9/50 , G06F12/02 , G06T1/20 , G06T1/60

Abstract: Embodiments of the present invention set forth techniques for allocating execution resources to groups of threads within a graphics processing unit. A compute work distributor included in the graphics processing unit receives an indication from a process that a first group of threads is to be launched. The compute work distributor determines that a first subcontext associated with the process has at least one processor credit. In some embodiments, CTAs may be launched even when there are no processor credits, if one of the TPCs that was already acquired has sufficient space. The compute work distributor identifies a first processor included in a plurality of processors that has a processing load that is less than or equal to the processor loads associated with all other processors included in the plurality of processors. The compute work distributor launches the first group of threads to execute on the first processor.

50.

发明授权
Managing event count reports in a tile-based architecture 有权

公开(公告)号：US10223122B2

公开(公告)日：2019-03-05

申请号：US15482779

申请日：2017-04-09

Applicant: NVIDIA Corporation

Inventor： Ziyad S. Hakura , Jerome F. Duluk, Jr.

IPC: G06F9/38 , G06T15/00 , G06T15/40 , G06T1/20 , G06T1/60 , G09G5/395 , G09G5/00 , G06T15/50 , G06F12/0808 , G06F12/0875 , G06F9/44 , G06T15/80 , G06T17/20

Abstract: One embodiment of the present invention sets forth a graphics processing system configured to track event counts in a tile-based architecture. The graphics processing system includes a screen-space pipeline and a tiling unit. The screen-space pipeline includes a first unit, a count memory associated with the first unit, and an accumulating memory associated with the first unit. The first unit is configured to detect an event type and increment the count memory. The tiling unit is configured to cause the screen-space pipeline to update an external memory address to reflect a first value stored in the count memory when the first unit completes processing of a first set of primitives. The tiling unit is also configured to cause the screen-space pipeline to update the accumulating memory to reflect a second value stored in the count memory when the first unit completes processing of a second set of primitives.

Search Results

Country/Region

Patent validity

Application date

Publication (announcement) day

applicant

The country/region where the applicant is located

Inventor

IPC

IPC Department

IPC class

IPC subclass

IPC group

IPC team

Appearance classification