-
公开(公告)号:US20230289215A1
公开(公告)日:2023-09-14
申请号:US17691621
申请日:2022-03-10
申请人: NVIDIA Corporation
发明人: Greg PALMER , Gentaro HIROTA , Ronny KRASHINSKY , Ze LONG , Brian PHARRIS , Rajballav DASH , Jeff TUCKEY , Jerome F. DULUK, JR. , Lacky SHAH , Luke DURANT , Jack CHOQUETTE , Eric WERNESS , Naman GOVIL , Manan PATEL , Shayani DEB , Sandeep NAVADA , John EDMONDSON , Prakash BANGALORE PRABHAKAR , Wish GANDHI , Ravi MANYAM , Apoorv PARLE , Olivier GIROUX , Shirish GADRE , Steve HEINRICH
CPC分类号: G06F9/4881 , G06F9/3851 , G06F9/3009 , G06F9/544
摘要: A new level(s) of hierarchy—Cooperate Group Arrays (CGAs)—and an associated new hardware-based work distribution/execution model is described. A CGA is a grid of thread blocks (also referred to as cooperative thread arrays (CTAs)). CGAs provide co-scheduling, e.g., control over where CTAs are placed/executed in a processor (such as a GPU), relative to the memory required by an application and relative to each other. Hardware support for such CGAs guarantees concurrency and enables applications to see more data locality, reduced latency, and better synchronization between all the threads in tightly cooperating collections of CTAs programmably distributed across different (e.g., hierarchical) hardware domains or partitions.
-
公开(公告)号:US20210019185A1
公开(公告)日:2021-01-21
申请号:US17063705
申请日:2020-10-05
申请人: NVIDIA CORPORATION
摘要: One embodiment of the present invention sets forth a technique for encapsulating compute task state that enables out-of-order scheduling and execution of the compute tasks. The scheduling circuitry organizes the compute tasks into groups based on priority levels. The compute tasks may then be selected for execution using different scheduling schemes. Each group is maintained as a linked list of pointers to compute tasks that are encoded as task metadata (TMD) stored in memory. A TMD encapsulates the state and parameters needed to initialize, schedule, and execute a compute task.
-
公开(公告)号:US20190340145A1
公开(公告)日:2019-11-07
申请号:US16450830
申请日:2019-06-24
申请人: NVIDIA CORPORATION
发明人: Jerome F. DULUK, JR. , Cameron BUSCHARDT , James Leroy DEMING , Brian FAHS , Mark HAIRGROVE , John MASHEY
IPC分类号: G06F13/40 , G06F12/123
摘要: Techniques are disclosed for tracking memory page accesses in a unified virtual memory system. An access tracking unit detects a memory page access generated by a first processor for accessing a memory page in a memory system of a second processor. The access tracking unit determines whether a cache memory includes an entry for the memory page. If so, then the access tracking unit increments an associated access counter. Otherwise, the access tracking unit attempts to find an unused entry in the cache memory that is available for allocation. If so, then the access tracking unit associates the second entry with the memory page, and sets an access counter associated with the second entry to an initial value. Otherwise, the access tracking unit selects a valid entry in the cache memory; clears an associated valid bit; associates the entry with the memory page; and initializes an associated access counter.
-
公开(公告)号:US20190243652A9
公开(公告)日:2019-08-08
申请号:US15960332
申请日:2018-04-23
申请人: NVIDIA Corporation
IPC分类号: G06F9/38 , G06T15/00 , G06T1/20 , G06F9/44 , G06F12/0875 , G06T15/80 , G06T1/60 , G06F12/0808 , G06T15/50 , G09G5/00 , G09G5/395 , G06T15/40 , G06T17/20
CPC分类号: G06T1/20 , G06F9/38 , G06F9/44 , G06F12/0808 , G06F12/0875 , G06F2212/302 , G06T1/60 , G06T15/005 , G06T15/405 , G06T15/503 , G06T15/80 , G06T17/20 , G09G5/003 , G09G5/395 , Y02D10/13
摘要: One embodiment of the present invention sets forth a graphics processing system. The graphics processing system includes a screen-space pipeline and a tiling unit. The screen-space pipeline is configured to perform visibility testing and fragment shading. The tiling unit is configured to determine that a first set of primitives overlaps a first cache tile. The tiling unit is also configured to first transmit the first set of primitives to the screen-space pipeline with a command configured to cause the screen-space pipeline to process the first set of primitives in a z-only mode, and then transmit the first set of primitives to the screen-space pipeline with a command configured to cause the screen-space pipeline to process the first set of primitives in a normal mode. In the z-only mode, at least some fragment shading operations are disabled in the screen-space pipeline. In the normal mode, fragment shading operations are enabled.
-
公开(公告)号:US20170199689A1
公开(公告)日:2017-07-13
申请号:US15169532
申请日:2016-05-31
申请人: NVIDIA Corporation
IPC分类号: G06F3/06 , G06F12/1009 , G06F12/1027
CPC分类号: G06F3/0622 , G06F3/0631 , G06F3/0647 , G06F3/0685 , G06F12/1009 , G06F12/1027 , G06F2212/656 , G06F2212/684
摘要: One embodiment of the present invention is a memory subsystem that includes a sliding window tracker that tracks memory accesses associated with a sliding window of memory page groups. When the sliding window tracker detects an access operation associated with a memory page group within the sliding window, the sliding window tracker sets a reference bit that is associated with the memory page group and is included in a reference vector that represents accesses to the memory page groups within the sliding window. Based on the values of the reference bits, the sliding window tracker causes the selection a memory page in a memory page group that has fallen into disuse from a first memory to a second memory. Because the sliding window tracker tunes the memory pages that are resident in the first memory to reflect memory access patterns, the overall performance of the memory subsystem is improved.
-
公开(公告)号:US20140281356A1
公开(公告)日:2014-09-18
申请号:US14011655
申请日:2013-08-27
申请人: NVIDIA CORPORATION
发明人: Cameron BUSCHARDT , Jerome F. DULUK, JR. , John MASHEY , Mark HAIRGROVE , James Leroy DEMING , Brian FAHS
IPC分类号: G06F12/10
CPC分类号: G06F12/1009 , G06F2212/301
摘要: One embodiment of the present invention includes a microcontroller coupled to a memory management unit (MMU). The MMU is coupled to a page table included in a physical memory, and the microcontroller is configured to perform one or more virtual memory operations associated with the physical memory and the page table. In operation, the microcontroller receives a page fault generated by the MMU in response to an invalid memory access via a virtual memory address. To remedy such a page fault, the microcontroller performs actions to map the virtual memory address to an appropriate location in the physical memory. By contrast, in prior-art systems, a fault handler would typically remedy the page fault. Advantageously, because the microcontroller executes these tasks locally with respect to the MMU and the physical memory, latency associated with remedying page faults may be decreased. Consequently, overall system performance may be increased.
摘要翻译: 本发明的一个实施例包括耦合到存储器管理单元(MMU)的微控制器。 MMU耦合到包括在物理存储器中的页表,并且微控制器被配置为执行与物理存储器和页表相关联的一个或多个虚拟存储器操作。 在操作中,微控制器响应于通过虚拟存储器地址的无效存储器访问而接收由MMU产生的页面错误。 为了纠正这种页面错误,微控制器执行操作以将虚拟存储器地址映射到物理存储器中的适当位置。 相比之下,在现有技术的系统中,故障处理器通常会补救页面错误。 有利地,由于微控制器相对于MMU和物理存储器在本地执行这些任务,所以与补救页错误相关联的延迟可能会降低。 因此,整体系统性能可能会增加。
-
7.
公开(公告)号:US20140122829A1
公开(公告)日:2014-05-01
申请号:US13660815
申请日:2012-10-25
申请人: NVIDIA CORPORATION
发明人: Nick BARROW-WILLIAMS , Brian FAHS , Jerome F. DULUK, JR. , James Leroy DEMING , Timothy John PURCELL , Lucien DUNNING , Mark HAIRGROVE
IPC分类号: G06F12/10
CPC分类号: G06F12/08 , G06F12/1009 , G06F12/1027 , G06F2212/684
摘要: A technique for simultaneously executing multiple tasks, each having an independent virtual address space, involves assigning an address space identifier (ASID) to each task and constructing each virtual memory access request to include both a virtual address and the ASID. During virtual to physical address translation, the ASID selects a corresponding page table, which includes virtual to physical address mappings for the ASID and associated task. Entries for a translation look-aside buffer (TLB) include both the virtual address and ASID to complete each mapping to a physical address. Deep scheduling of tasks sharing a virtual address space may be implemented to improve cache affinity for both TLB and data caches.
摘要翻译: 一种用于同时执行多个任务的技术,每个任务具有独立的虚拟地址空间,包括为每个任务分配地址空间标识符(ASID),并且构建每个虚拟存储器访问请求以包括虚拟地址和ASID。 在虚拟到物理地址转换期间,ASID选择相应的页表,其中包括ASID和相关任务的虚拟到物理地址映射。 翻译后备缓冲区(TLB)的条目包括虚拟地址和ASID,以完成对物理地址的每个映射。 可以实现对共享虚拟地址空间的任务的深度调度,以提高对TLB和数据高速缓存的高速缓存亲和性。
-
公开(公告)号:US20210073125A1
公开(公告)日:2021-03-11
申请号:US16562361
申请日:2019-09-05
申请人: NVIDIA CORPORATION
发明人: Jerome F. DULUK, JR. , Gregory Scott PALMER , Jonathon Stuart Ramsey EVANS , Shailendra SINGH , Samuel H. DUNCAN , Wishwesh Anil GANDHI , Lacky V. SHAH , Eric ROCK , Feiqi SU , James Leroy DEMING , Alan MENEZES , Pranav VAIDYA , Praveen JOGINIPALLY , Timothy John PURCELL , Manas MANDAL
IPC分类号: G06F12/06
摘要: A parallel processing unit (PPU) can be divided into partitions. Each partition is configured to operate similarly to how the entire PPU operates. A given partition includes a subset of the computational and memory resources associated with the entire PPU. Software that executes on a CPU partitions the PPU for an admin user. A guest user is assigned to a partition and can perform processing tasks within that partition in isolation from any other guest users assigned to any other partitions. Because the PPU can be divided into isolated partitions, multiple CPU processes can efficiently utilize PPU resources.
-
公开(公告)号:US20210073025A1
公开(公告)日:2021-03-11
申请号:US16562359
申请日:2019-09-05
申请人: NVIDIA CORPORATION
发明人: Jerome F. DULUK, JR. , Gregory Scott PALMER , Jonathon Stuart Ramsey EVANS , Shailendra SINGH , Samuel H. DUNCAN , Wishwesh Anil GANDHI , Lacky V. SHAH , Eric ROCK , Feiqi SU , James Leroy DEMING , Alan MENEZES , Pranav VAIDYA , Praveen JOGINIPALLY , Timothy John PURCELL , Manas MANDAL
摘要: A parallel processing unit (PPU) can be divided into partitions. Each partition is configured to operate similarly to how the entire PPU operates. A given partition includes a subset of the computational and memory resources associated with the entire PPU. Software that executes on a CPU partitions the PPU for an admin user. A guest user is assigned to a partition and can perform processing tasks within that partition in isolation from any other guest users assigned to any other partitions. Because the PPU can be divided into isolated partitions, multiple CPU processes can efficiently utilize PPU resources.
-
公开(公告)号:US20150339799A1
公开(公告)日:2015-11-26
申请号:US14817151
申请日:2015-08-03
申请人: NVIDIA CORPORATION
发明人: Eric B. LUM , Jerome F. DULUK, JR.
CPC分类号: G06T1/60 , B41F15/34 , G06T11/40 , G06T15/005
摘要: One embodiment sets forth a method for associating each stencil value included in a stencil buffer with multiple fragments. Components within a graphics processing pipeline use a set of stencil masks to partition the bits of each stencil value. Each stencil mask selects a different subset of bits, and each fragment is strategically associated with both a stencil value and a stencil mask. Before performing stencil actions associated with a fragment, the raster operations unit performs stencil mask operations on the operands. No fragments are associated with both the same stencil mask and the same stencil value. Consequently, no fragments are associated with the same stencil bits included in the stencil buffer. Advantageously, by reducing the number of stencil bits associated with each fragment, certain classes of software applications may reduce the wasted memory associated with stencil buffers in which each stencil value is associated with a single fragment.
摘要翻译: 一个实施例提出了一种用于将包括在模板缓冲器中的每个模版值与多个片段相关联的方法。 图形处理流水线中的组件使用一组模板掩模来分割每个模板值的位。 每个模板掩模选择不同的位子集,并且每个片段与模板值和模板掩模两者战略性地相关联。 在执行与片段相关联的模板操作之前,光栅操作单元对操作数执行模板掩码操作。 没有碎片与相同的模板掩模和相同的模板值相关联。 因此,没有碎片与包括在模板缓冲器中的相同模板位相关联。 有利地,通过减少与每个片段相关联的模板位的数量,某些类别的软件应用可以减少与模板缓冲器相关联的浪费的存储器,其中每个模板值与单个片段相关联。
-
-
-
-
-
-
-
-
-