INTERMEDIATE CACHE MANAGEMENT FOR NON-UNIFORM MEMORY ARCHITECTURE

    公开(公告)号:US20240411706A1

    公开(公告)日:2024-12-12

    申请号:US18208059

    申请日:2023-06-09

    Abstract: A cache controller of a processing system implementing a non-uniform memory architecture (NUMA) adjusts a cache replacement priority of local and non-local data stored at a cache based on a cache replacement policy. Local data is data that is accessed by the cache via a local memory channel and non-local data is data that is accessed by the cache via a non-local memory channel. The cache controller assigns priorities to local and non-local data stored at the cache based on a cache replacement policy and selects data for replacement at the cache based, at least in part, on the assigned priorities.

    Cache blocking for dispatches
    3.
    发明授权

    公开(公告)号:US12189534B2

    公开(公告)日:2025-01-07

    申请号:US17564474

    申请日:2021-12-29

    Abstract: A processing system divides successive dispatches of work items into portions. The successive dispatches are separated from each other by barriers, each barrier indicating that the work items of the previous dispatch must complete execution before work items of a subsequent dispatch can begin execution. In some embodiments, the processing system interleaves execution of portions of a first dispatch with portions of subsequent dispatches that consume data produced by the first dispatch. The processing system thereby reduces the amount of data written to the local cache by a producer dispatch while preserving data locality for a subsequent consumer (or consumer/producer) dispatch and facilitating processing efficiency.

    Dead surface invalidation
    4.
    发明授权

    公开(公告)号:US12033239B2

    公开(公告)日:2024-07-09

    申请号:US17563950

    申请日:2021-12-28

    CPC classification number: G06T1/60 G06F12/0891 G06T1/20 G06F2212/455

    Abstract: Systems, apparatuses, and methods for performing dead surface invalidation are disclosed. An application sends draw call commands to a graphics processing unit (GPU) via a driver, with the draw call commands rendering to surfaces. After it is determined that a given surface will no longer be accessed by subsequent draw calls, the application sends a surface invalidation command for the given surface to a command processor of the GPU. After the command processor receives the surface invalidation command, the command processor waits for a shader engine to send a draw call completion message for a last draw call to access the given surface. Once the command processor receives the draw call completion message, the command processor sends a surface invalidation command to a cache to invalidate cache lines for the given surface to free up space in the cache for other data.

    Compressing texture data on a per-channel basis

    公开(公告)号:US11694367B2

    公开(公告)日:2023-07-04

    申请号:US17716186

    申请日:2022-04-08

    CPC classification number: G06T9/00 G06T1/60 G06T2200/04

    Abstract: Sampling circuitry independently accesses channels of texture data that represent a set of pixels. One or more processing units separately compress the channels of the texture data and store compressed data representative of the channels of the texture data for the set of pixels. The channels can include a red channel, a blue channel, and a green channel that represent color values of the set of pixels and an alpha channel that represents degrees of transparency of the set of pixels. Storing the compressed data can include writing the compress data to portions of a cache. The processing units can identify a subset of the set of pixels that share a value of a first channel of the plurality of channels and represent the value of the first channel over the subset of the set of pixels using information representing the value, the first channel, and boundaries of the subset.

    Compressing texture data on a per-channel basis

    公开(公告)号:US11308648B2

    公开(公告)日:2022-04-19

    申请号:US17030048

    申请日:2020-09-23

    Abstract: Sampling circuitry independently accesses channels of texture data that represent a set of pixels. One or more processing units separately compress the channels of the texture data and store compressed data representative of the channels of the texture data for the set of pixels. The channels can include a red channel, a blue channel, and a green channel that represent color values of the set of pixels and an alpha channel that represents degrees of transparency of the set of pixels. Storing the compressed data can include writing the compress data to portions of a cache. The processing units can identify a subset of the set of pixels that share a value of a first channel of the plurality of channels and represent the value of the first channel over the subset of the set of pixels using information representing the value, the first channel, and boundaries of the subset.

    DEAD SURFACE INVALIDATION
    7.
    发明公开

    公开(公告)号:US20230206384A1

    公开(公告)日:2023-06-29

    申请号:US17563950

    申请日:2021-12-28

    CPC classification number: G06T1/60 G06F12/0891 G06T1/20 G06F2212/455

    Abstract: Systems, apparatuses, and methods for performing dead surface invalidation are disclosed. An application sends draw call commands to a graphics processing unit (GPU) via a driver, with the draw call commands rendering to surfaces. After it is determined that a given surface will no longer be accessed by subsequent draw calls, the application sends a surface invalidation command for the given surface to a command processor of the GPU. After the command processor receives the surface invalidation command, the command processor waits for a shader engine to send a draw call completion message for a last draw call to access the given surface. Once the command processor receives the draw call completion message, the command processor sends a surface invalidation command to a cache to invalidate cache lines for the given surface to free up space in the cache for other data.

    STOCHASTIC OPTIMIZATION OF SURFACE CACHEABILITY IN PARALLEL PROCESSING UNITS

    公开(公告)号:US20230195639A1

    公开(公告)日:2023-06-22

    申请号:US17557475

    申请日:2021-12-21

    CPC classification number: G06F12/0893 G06F2212/6042

    Abstract: A processing system selectively allocates storage at a local cache of a parallel processing unit for cache lines of a repeating pattern of data that exceeds the storage capacity of the cache. The processing system identifies repeating patterns of data having cache lines that have a reuse distance that exceeds the storage capacity of the cache. A cache controller allocates storage for only a subset of cache lines of the repeating pattern of data at the cache and excludes the remainder of cache lines of the repeating pattern of data from the cache. By restricting the cache to store only a subset of cache lines of the repeating pattern of data, the cache controller increases the hit rate at the cache for the subset of cache lines.

    VARIABLE DISPATCH WALK FOR SUCCESSIVE CACHE ACCESSES

    公开(公告)号:US20230195626A1

    公开(公告)日:2023-06-22

    申请号:US17558008

    申请日:2021-12-21

    CPC classification number: G06F12/0806 G06F12/10 G06F2212/1016

    Abstract: A processing system is configured to translate a first cache access pattern of a dispatch of work items to a cache access pattern that facilitates consumption of data stored at a cache of a parallel processing unit by a subsequent access before the data is evicted to a more remote level of the memory hierarchy. For consecutive cache accesses having read-after-read data locality, in some embodiments the processing system translates the first cache access pattern to a space-filling curve. In some embodiments, for consecutive accesses having read-after-write data locality, the processing system translates a first typewriter cache access pattern that proceeds in ascending order for a first access to a reverse typewriter cache access pattern that proceeds in descending order for a subsequent cache access. By translating the cache access pattern based on data locality, the processing system increases the hit rate of the cache.

    TRAVERSAL RECURSION FOR ACCELERATION STRUCTURE TRAVERSAL

    公开(公告)号:US20240370965A1

    公开(公告)日:2024-11-07

    申请号:US18373004

    申请日:2023-09-26

    Abstract: A processing unit includes traversal recursion circuitry that performs, on behalf of a software shader, at least some of the requisite actions for traversing selected types of nodes of the acceleration structure. In response to identifying a first node of a raytracing acceleration structure is of a first type, the processing unit provides an intersection result for the first node to recursion circuitry. In response to the intersection result for the first node, the processing unit performs a traversal operation for the raytracing acceleration structure at the recursion circuitry.

Patent Agency Ranking