Replaying memory transactions while resolving memory access faults
    41.
    发明授权
    Replaying memory transactions while resolving memory access faults 有权
    在解决内存访问故障的同时重新记忆事务

    公开(公告)号:US09575892B2

    公开(公告)日:2017-02-21

    申请号:US14109678

    申请日:2013-12-17

    Abstract: One embodiment of the present invention is a parallel processing unit (PPU) that includes one or more streaming multiprocessors (SMs) and implements a replay unit per SM. Upon detecting a page fault associated with a memory transaction issued by a particular SM, the corresponding replay unit causes the SM, but not any unaffected SMs, to cease issuing new memory transactions. The replay unit then stores the faulting memory transaction and any faulting in-flight memory transaction in a replay buffer. As page faults are resolved, the replay unit replays the memory transactions in the replay buffer—removing successful memory transactions from the replay buffer—until all of the stored memory transactions have successfully executed. Advantageously, the overall performance of the PPU is improved compared to conventional PPUs that, upon detecting a page fault, stop performing memory transactions across all SMs included in the PPU until the fault is resolved.

    Abstract translation: 本发明的一个实施例是包括一个或多个流式多处理器(SM)并且实现每SM的重放单元的并行处理单元(PPU)。 当检测到与由特定SM发出的存储器事务相关联的页面错误时,相应的重放单元使得SM,而不是任何未受影响的SM停止发行新的存储器事务。 重播单元然后将故障存储器事务和任何故障的飞行中存储器事务存储在重放缓冲器中。 当页面错误得到解决时,重播单元重播重播缓冲区中的内存事务,从重播缓冲区中移除成功的内存事务,直到所有存储的内存事务都已成功执行。 有利的是,与常规PPU相比,PPU的整体性能得到改善,在常规PPU检测到页面故障之后,停止执行包含在PPU中的所有SM的存储器事务,直到故障被解决为止。

    Migration directives in a unified virtual memory system architecture
    42.
    发明授权
    Migration directives in a unified virtual memory system architecture 有权
    迁移指令在统一的虚拟内存系统架构中

    公开(公告)号:US09430400B2

    公开(公告)日:2016-08-30

    申请号:US14109712

    申请日:2013-12-17

    CPC classification number: G06F12/1027 G06F12/08 G06F12/1009

    Abstract: One embodiment of the present invention sets forth a computer-implemented method for altering migration rules for a unified virtual memory system. The method includes detecting that a migration rule trigger has been satisfied. The method also includes identifying a migration rule action that is associated with the migration rule trigger. The method further includes executing the migration rule action. Other embodiments of the present invention include a computer-readable medium, a computing device, and a unified virtual memory subsystem. One advantage of the disclosed approach is that various settings of the unified virtual memory system may be modified during program execution. This ability to alter the settings allows for an application to vary the manner in which memory pages are migrated and otherwise manipulated, which provides the application the ability to optimize the unified virtual memory system for efficient execution.

    Abstract translation: 本发明的一个实施例提出了一种用于改变统一虚拟存储器系统的迁移规则的计算机实现的方法。 该方法包括检测到已经满足迁移规则触发。 该方法还包括标识与迁移规则触发相关联的迁移规则操作。 该方法还包括执行迁移规则动作。 本发明的其他实施例包括计算机可读介质,计算设备和统一虚拟存储器子系统。 所公开的方法的一个优点是可以在程序执行期间修改统一虚拟存储器系统的各种设置。 改变设置的这种能力允许应用程序改变内存页面被迁移和以其他方式处理的方式,这为应用程序提供了优化统一虚拟内存系统以有效执行的能力。

    Managing per-tile event count reports in a tile-based architecture
    44.
    发明授权
    Managing per-tile event count reports in a tile-based architecture 有权
    在基于瓦片的架构中管理每个瓦片事件计数报告

    公开(公告)号:US09311097B2

    公开(公告)日:2016-04-12

    申请号:US14061409

    申请日:2013-10-23

    Abstract: A graphics processing system configured to track per-tile event counts in a tile-based architecture. A tiling unit in the graphics processing system is configured to cause a screen-space pipeline to load a count value associated with a first cache tile into a count memory and to cause the screen-space pipeline to process a first set of primitives that intersect the first cache tile. The tiling unit is further configured to cause the screen-space pipeline to store a second count value in a report memory location. The tiling unit is also configured to cause the screen-space pipeline to process a second set of primitives that intersect the first cache tile and to cause the screen-space pipeline to store a third count value in the first accumulating memory. Conditional rendering operations may be performed on a per-cache tile basis, based on the per-tile event count.

    Abstract translation: 图形处理系统被配置为在基于瓦片的架构中跟踪每瓦片事件计数。 图形处理系统中的平铺单元被配置为使屏幕空间管线将与第一高速缓存片相关联的计数值加载到计数存储器中,并且使屏幕空间管线处理与第一组图元相交的第一组图元 第一个缓存平铺。 平铺单元还被配置为使得屏幕空间管线将第二计数值存储在报告存储器位置中。 平铺单元还被配置为使屏幕空间管线处理与第一高速缓存片相交的第二组图元,并且使屏幕空间管线在第一累积存储器中存储第三计数值。 可以基于每个瓦片事件计数在每个高速缓存瓦片的基础上执行条件呈现操作。

    PCIe traffic tracking hardware in a unified virtual memory system

    公开(公告)号:US11210253B2

    公开(公告)日:2021-12-28

    申请号:US16450830

    申请日:2019-06-24

    Abstract: Techniques are disclosed for tracking memory page accesses in a unified virtual memory system. An access tracking unit detects a memory page access generated by a first processor for accessing a memory page in a memory system of a second processor. The access tracking unit determines whether a cache memory includes an entry for the memory page. If so, then the access tracking unit increments an associated access counter. Otherwise, the access tracking unit attempts to find an unused entry in the cache memory that is available for allocation. If so, then the access tracking unit associates the second entry with the memory page, and sets an access counter associated with the second entry to an initial value. Otherwise, the access tracking unit selects a valid entry in the cache memory; clears an associated valid bit; associates the entry with the memory page; and initializes an associated access counter.

    Dynamic partitioning of execution resources

    公开(公告)号:US10817338B2

    公开(公告)日:2020-10-27

    申请号:US15885761

    申请日:2018-01-31

    Abstract: Embodiments of the present invention set forth techniques for allocating execution resources to groups of threads within a graphics processing unit. A compute work distributor included in the graphics processing unit receives an indication from a process that a first group of threads is to be launched. The compute work distributor determines that a first subcontext associated with the process has at least one processor credit. In some embodiments, CTAs may be launched even when there are no processor credits, if one of the TPCs that was already acquired has sufficient space. The compute work distributor identifies a first processor included in a plurality of processors that has a processing load that is less than or equal to the processor loads associated with all other processors included in the plurality of processors. The compute work distributor launches the first group of threads to execute on the first processor.

    DYNAMIC PARTITIONING OF EXECUTION RESOURCES
    49.
    发明申请

    公开(公告)号:US20190235924A1

    公开(公告)日:2019-08-01

    申请号:US15885761

    申请日:2018-01-31

    Abstract: Embodiments of the present invention set forth techniques for allocating execution resources to groups of threads within a graphics processing unit. A compute work distributor included in the graphics processing unit receives an indication from a process that a first group of threads is to be launched. The compute work distributor determines that a first subcontext associated with the process has at least one processor credit. In some embodiments, CTAs may be launched even when there are no processor credits, if one of the TPCs that was already acquired has sufficient space. The compute work distributor identifies a first processor included in a plurality of processors that has a processing load that is less than or equal to the processor loads associated with all other processors included in the plurality of processors. The compute work distributor launches the first group of threads to execute on the first processor.

    Managing event count reports in a tile-based architecture

    公开(公告)号:US10223122B2

    公开(公告)日:2019-03-05

    申请号:US15482779

    申请日:2017-04-09

    Abstract: One embodiment of the present invention sets forth a graphics processing system configured to track event counts in a tile-based architecture. The graphics processing system includes a screen-space pipeline and a tiling unit. The screen-space pipeline includes a first unit, a count memory associated with the first unit, and an accumulating memory associated with the first unit. The first unit is configured to detect an event type and increment the count memory. The tiling unit is configured to cause the screen-space pipeline to update an external memory address to reflect a first value stored in the count memory when the first unit completes processing of a first set of primitives. The tiling unit is also configured to cause the screen-space pipeline to update the accumulating memory to reflect a second value stored in the count memory when the first unit completes processing of a second set of primitives.

Patent Agency Ranking