-
公开(公告)号:US09098924B2
公开(公告)日:2015-08-04
申请号:US13942415
申请日:2013-07-15
Applicant: NVIDIA CORPORATION
Inventor: Eric B. Lum , Jerome F. Duluk, Jr.
CPC classification number: G06T1/60 , B41F15/34 , G06T11/40 , G06T15/005
Abstract: One embodiment sets forth a method for associating each stencil value included in a stencil buffer with multiple fragments. Components within a graphics processing pipeline use a set of stencil masks to partition the bits of each stencil value. Each stencil mask selects a different subset of bits, and each fragment is strategically associated with both a stencil value and a stencil mask. Before performing stencil actions associated with a fragment, the raster operations unit performs stencil mask operations on the operands. No fragments are associated with both the same stencil mask and the same stencil value. Consequently, no fragments are associated with the same stencil bits included in the stencil buffer. Advantageously, by reducing the number of stencil bits associated with each fragment, certain classes of software applications may reduce the wasted memory associated with stencil buffers in which each stencil value is associated with a single fragment.
-
公开(公告)号:US11741015B2
公开(公告)日:2023-08-29
申请号:US17820870
申请日:2022-08-18
Applicant: NVIDIA CORPORATION
Inventor: Jerome F. Duluk, Jr. , Cameron Buschardt , Sherry Cheung , James Leroy Deming , Samuel H. Duncan , Lucien Dunning , Robert George , Arvind Gopalakrishnan , Mark Hairgrove , Chenghuan Jia , John Mashey
IPC: G06F11/07 , G06F12/08 , G06F12/1072 , G06F12/109 , G06F12/12 , G06F12/10 , G06F12/1009
CPC classification number: G06F12/1009 , G06F11/073 , G06F11/0793 , G06F12/08 , G06F12/109 , G06F12/1072 , G06F12/12 , G06F12/10 , G06F2212/1016
Abstract: A system for managing virtual memory. The system includes a first processing unit configured to execute a first operation that references a first virtual memory address. The system also includes a first memory management unit (MMU) associated with the first processing unit and configured to generate a first page fault upon determining that a first page table that is stored in a first memory unit associated with the first processing unit does not include a mapping corresponding to the first virtual memory address. The system further includes a first copy engine associated with the first processing unit. The first copy engine is configured to read a first command queue to determine a first mapping that corresponds to the first virtual memory address and is included in a first page state directory. The first copy engine is also configured to update the first page table to include the first mapping.
-
公开(公告)号:US11307903B2
公开(公告)日:2022-04-19
申请号:US15885751
申请日:2018-01-31
Applicant: NVIDIA Corporation
Inventor: Jerome F. Duluk, Jr. , Luke Durant , Ramon Matas Navarro , Alan Menezes , Jeffrey Tuckey , Gentaro Hirota , Brian Pharris
Abstract: Embodiments of the present invention set forth techniques for allocating execution resources to groups of threads within a graphics processing unit. A compute work distributor included in the graphics processing unit receives an indication from a process that a first group of threads is to be launched. The compute work distributor determines that a first subcontext associated with the process has at least one processor credit. In some embodiments, CTAs may be launched even when there are no processor credits, if one of the TPCs that was already acquired has sufficient space. The compute work distributor identifies a first processor included in a plurality of processors that has a processing load that is less than or equal to the processor loads associated with all other processors included in the plurality of processors. The compute work distributor launches the first group of threads to execute on the first processor.
-
公开(公告)号:US09830262B2
公开(公告)日:2017-11-28
申请号:US14133488
申请日:2013-12-18
Applicant: NVIDIA CORPORATION
Inventor: Jerome F. Duluk, Jr. , Cameron Buschardt , James Leroy Deming , Brian Fahs
CPC classification number: G06F12/08 , G06F11/3037 , G06F11/3442 , G06F11/3471 , G06F2201/81 , G06F2201/815 , G06F2201/88 , G06F2212/205
Abstract: Embodiments of the approaches disclosed herein include a subsystem that includes an access tracking mechanism configured to monitor access operations directed to a first memory and a second memory. The access tracking mechanism detects an access operation generated by a processor for accessing a first memory page residing on the second memory. The access tracking mechanism further determines that the first memory page is included in a first subset of memory pages residing on the second memory. The access tracking mechanism further locates, within a reference vector, a reference bit that corresponds to the first memory page, and sets the reference bit. One advantage of the present invention is that memory pages in a hybrid system migrate as needed to increase overall memory performance.
-
公开(公告)号:US09355430B2
公开(公告)日:2016-05-31
申请号:US14033389
申请日:2013-09-20
Applicant: NVIDIA CORPORATION
Inventor: Eric B. Lum , Cass W. Everitt , Henry Packard Moreton , Yury Y. Uralsky , Cyril Crassin , Jerome F. Duluk, Jr.
CPC classification number: G06T1/60
Abstract: One embodiment sets forth a method for allocating memory to surfaces. A software application specifies surface data, including interleaving state data. Based on the interleaving state data, a surface access unit bloats addressees derived from discrete coordinates associated with the surface, creating a bloated virtual address space with a predictable pattern of addresses that do not correspond to data. Advantageously, by creating predictable regions of addresses that do not correspond to data, the software application program may configure the surface to share physical memory space with one or more other surfaces. In particular, the software application may map the virtual address space together with one or more virtual address spaces corresponding to complementary data patterns to the same physical base address. And, by overlapping the virtual address spaces onto the same pages in physical address space, the physical memory may be more densely packed than by using prior-art allocation techniques.
Abstract translation: 一个实施例提出了一种用于将存储器分配给表面的方法。 软件应用程序指定表面数据,包括交错状态数据。 基于交错状态数据,表面访问单元使得与表面相关联的离散坐标导出的地址变得膨胀,从而产生具有与数据不对应的可预测地址模式的膨胀的虚拟地址空间。 有利地,通过创建不对应于数据的地址的可预测区域,软件应用程序可以配置表面以与一个或多个其他表面共享物理存储器空间。 特别地,软件应用程序可以将虚拟地址空间与对应于互补数据模式的一个或多个虚拟地址空间映射到相同的物理基址。 并且,通过将虚拟地址空间重叠到物理地址空间中的相同页面上,与使用现有技术的分配技术相比,物理存储器可能更加密集。
-
16.
公开(公告)号:US09082212B2
公开(公告)日:2015-07-14
申请号:US13723972
申请日:2012-12-21
Applicant: NVIDIA CORPORATION
Inventor: Jerome F. Duluk, Jr. , Jesse David Hall
IPC: G06T15/00
CPC classification number: G06T15/005
Abstract: Techniques are disclosed for dispatching pixel information in a graphics processing pipeline. A fragment processing unit in the graphics processing pipeline generates a pixel that includes multiple samples based on a portion of a graphics primitive received by a thread. The fragment processing unit calculates a set of source values, where each source value corresponds to a different sample of the pixel. The fragment processing unit retrieves a set of destination values from a render target, where each destination value corresponds to a different source value. The fragment processing unit blends each source value with a corresponding destination value to create a set of final values, and creates one or more dispatch messages to store the set of final values in a set of output registers. One advantage of the disclosed techniques is that pixel shader programs perform per-sample operations with increased efficiency.
Abstract translation: 公开了用于在图形处理流水线中调度像素信息的技术。 图形处理流水线中的片段处理单元基于由线程接收的图形原语的一部分生成包括多个样本的像素。 片段处理单元计算一组源值,其中每个源值对应于像素的不同样本。 片段处理单元从渲染目标检索一组目的地值,其中每个目的地值对应于不同的源值。 片段处理单元将每个源值与相应的目的地值相混合以创建一组最终值,并且创建一个或多个调度消息以将一组最终值存储在一组输出寄存器中。 所公开的技术的一个优点是像素着色器程序以更高的效率执行每个样本操作。
-
公开(公告)号:US10445243B2
公开(公告)日:2019-10-15
申请号:US14055356
申请日:2013-10-16
Applicant: NVIDIA CORPORATION
Inventor: Jerome F. Duluk, Jr. , Cameron Buschardt , Sherry Cheung , James Leroy Deming , Samuel H. Duncan , Lucien Dunning , Robert George , Arvind Gopalakrishnan , Mark Hairgrove , Chenghuan Jia , John Mashey
IPC: G06F13/12 , G06F13/38 , G06F12/1009 , G06F12/109 , G06F12/1072 , G06F11/07 , G06F12/12 , G06F12/08 , G06F12/10
Abstract: A system for managing virtual memory. The system includes a first processing unit configured to execute a first operation that references a first virtual memory address. The system also includes a first memory management unit (MMU) associated with the first processing unit and configured to generate a first page fault upon determining that a first page table that is stored in a first memory unit associated with the first processing unit does not include a mapping corresponding to the first virtual memory address. The system further includes a first copy engine associated with the first processing unit. The first copy engine is configured to read a first command queue to determine a first mapping that corresponds to the first virtual memory address and is included in a first page state directory. The first copy engine is also configured to update the first page table to include the first mapping.
-
公开(公告)号:US10303616B2
公开(公告)日:2019-05-28
申请号:US14055382
申请日:2013-10-16
Applicant: NVIDIA Corporation
Inventor: Jerome F. Duluk, Jr. , Chenghuan Jia , John Mashey , Cameron Buschardt , Sherry Cheung , James Leroy Deming , Samuel H. Duncan , Lucien Dunning , Robert George , Arvind Gopalakrishnan , Mark Hairgrove
IPC: G06F13/12 , G06F13/38 , G06F12/1009 , G06F12/109 , G06F12/1072 , G06F11/07 , G06F12/12 , G06F12/08 , G06F12/10
Abstract: A system for managing virtual memory. The system includes a first processing unit configured to execute a first operation that references a first virtual memory address. The system also includes a first memory management unit (MMU) associated with the first processing unit and configured to generate a first page fault upon determining that a first page table that is stored in a first memory unit associated with the first processing unit does not include a mapping corresponding to the first virtual memory address. The system further includes a first copy engine associated with the first processing unit. The first copy engine is configured to read a first command queue to determine a first mapping that corresponds to the first virtual memory address and is included in a first page state directory. The first copy engine is also configured to update the first page table to include the first mapping.
-
公开(公告)号:US10061526B2
公开(公告)日:2018-08-28
申请号:US15169532
申请日:2016-05-31
Applicant: NVIDIA Corporation
Inventor: John Mashey , Cameron Buschardt , James Leroy Deming , Jerome F. Duluk, Jr. , Brian Fahs
IPC: G06F3/06 , G06F12/1027 , G06F12/1009
CPC classification number: G06F3/0622 , G06F3/0631 , G06F3/0647 , G06F3/0685 , G06F12/1009 , G06F12/1027 , G06F2212/656 , G06F2212/684
Abstract: One embodiment of the present invention is a memory subsystem that includes a sliding window tracker that tracks memory accesses associated with a sliding window of memory page groups. When the sliding window tracker detects an access operation associated with a memory page group within the sliding window, the sliding window tracker sets a reference bit that is associated with the memory page group and is included in a reference vector that represents accesses to the memory page groups within the sliding window. Based on the values of the reference bits, the sliding window tracker causes the selection a memory page in a memory page group that has fallen into disuse from a first memory to a second memory. Because the sliding window tracker tunes the memory pages that are resident in the first memory to reflect memory access patterns, the overall performance of the memory subsystem is improved.
-
公开(公告)号:US09952868B2
公开(公告)日:2018-04-24
申请号:US14043432
申请日:2013-10-01
Applicant: NVIDIA Corporation
Inventor: Ziyad S. Hakura , Jerome F. Duluk, Jr.
IPC: G06F9/38 , G06F9/44 , G06T15/00 , G06T15/40 , G06T1/20 , G06T1/60 , G06T15/50 , G09G5/00 , G09G5/393 , G09G5/395 , G06F12/0808 , G06F12/0875 , G06T15/80
CPC classification number: G06T1/20 , G06F9/38 , G06F9/44 , G06F12/0808 , G06F12/0875 , G06F2212/302 , G06T1/60 , G06T15/005 , G06T15/405 , G06T15/503 , G06T15/80 , G06T17/20 , G09G5/003 , G09G5/395 , Y02D10/13
Abstract: One embodiment of the present invention sets forth a graphics processing system. The graphics processing system includes a screen-space pipeline and a tiling unit. The screen-space pipeline is configured to perform visibility testing and fragment shading. The tiling unit is configured to determine that a first set of primitives overlaps a first cache tile. The tiling unit is also configured to first transmit the first set of primitives to the screen-space pipeline with a command configured to cause the screen-space pipeline to process the first set of primitives in a z-only mode, and then transmit the first set of primitives to the screen-space pipeline with a command configured to cause the screen-space pipeline to process the first set of primitives in a normal mode. In the z-only mode, at least some fragment shading operations are disabled in the screen-space pipeline. In the normal mode, fragment shading operations are enabled.
-
-
-
-
-
-
-
-
-