VARIABLE DISPATCH WALK FOR SUCCESSIVE CACHE ACCESSES

    公开(公告)号:US20230195626A1

    公开(公告)日:2023-06-22

    申请号:US17558008

    申请日:2021-12-21

    CPC classification number: G06F12/0806 G06F12/10 G06F2212/1016

    Abstract: A processing system is configured to translate a first cache access pattern of a dispatch of work items to a cache access pattern that facilitates consumption of data stored at a cache of a parallel processing unit by a subsequent access before the data is evicted to a more remote level of the memory hierarchy. For consecutive cache accesses having read-after-read data locality, in some embodiments the processing system translates the first cache access pattern to a space-filling curve. In some embodiments, for consecutive accesses having read-after-write data locality, the processing system translates a first typewriter cache access pattern that proceeds in ascending order for a first access to a reverse typewriter cache access pattern that proceeds in descending order for a subsequent cache access. By translating the cache access pattern based on data locality, the processing system increases the hit rate of the cache.

    Dynamic modification of coherent atomic memory operations

    公开(公告)号:US11604737B1

    公开(公告)日:2023-03-14

    申请号:US17516860

    申请日:2021-11-02

    Abstract: A processing device determines a scope indicating at least a portion of the processing system and target data from atomic memory operation to be performed. Based on the scope, the processing device determines one or more hardware parameters for at least a portion of the processing system. The processing device then compares the hardware parameters to the scope and target data to determine one or more corrections. The processing device then provides the scope, target data, hardware parameters, and corrections to a plurality of hardware lookup tables. The hardware lookup tables are configured to receive the scope, target data, hardware parameters, and corrections as inputs and output values indicating one or more coherency actions and one or more orderings. The processing device then executes one or more of the indicated coherency actions and the atomic memory operation based on the indicated ordering.

    Residency map descriptors
    13.
    发明授权

    公开(公告)号:US11521342B2

    公开(公告)日:2022-12-06

    申请号:US17230140

    申请日:2021-04-14

    Abstract: A processor receives a request to access one or more levels of a partially resident texture (PRT) resource. The levels represent a texture at different levels of detail (LOD) and the request includes normalized coordinates indicating a location in the texture. The processor accesses a texture descriptor that includes dimensions of a first level of the levels and one or more offsets between a reference level and one or more second levels that are associated with one or more residency maps that indicate texels that are resident in the PRT resource. The processor translates the normalized coordinates to texel coordinates in the one or more residency maps based on the offset and accesses, in response to the request, the one or more residency maps based on the texel coordinates to determine whether texture data indicated by the normalized coordinates is resident in the PRT resource.

    VMID AS A GPU TASK CONTAINER FOR VIRTUALIZATION

    公开(公告)号:US20200042348A1

    公开(公告)日:2020-02-06

    申请号:US16050948

    申请日:2018-07-31

    Abstract: Systems, apparatuses, and methods for abstracting tasks in virtual memory identifier (VMID) containers are disclosed. A processor coupled to a memory executes a plurality of concurrent tasks including a first task. Responsive to detecting one or more instructions of the first task which correspond to a first operation, the processor retrieves a first identifier (ID) which is used to uniquely identify the first task, wherein the first ID is transparent to the first task. Then, the processor maps the first ID to a second ID and/or a third ID. The processor completes the first operation by using the second ID and/or the third ID to identify the first task to at least a first data structure. In one implementation, the first operation is a memory access operation and the first data structure is a set of page tables. Also, in one implementation, the second ID identifies a first application of the first task and the third ID identifies a first operating system (OS) of the first task.

    Residency map descriptors
    15.
    发明授权

    公开(公告)号:US10540802B1

    公开(公告)日:2020-01-21

    申请号:US16263986

    申请日:2019-01-31

    Abstract: A processor receives a request to access one or more levels of a partially resident texture (PRT) resource. The levels represent a texture at different levels of detail (LOD) and the request includes normalized coordinates indicating a location in the texture. The processor accesses a texture descriptor that includes dimensions of a first level of the levels and one or more offsets between a reference level and one or more second levels that are associated with one or more residency maps that indicate texels that are resident in the PRT resource. The processor translates the normalized coordinates to texel coordinates in the one or more residency maps based on the offset and accesses, in response to the request, the one or more residency maps based on the texel coordinates to determine whether texture data indicated by the normalized coordinates is resident in the PRT resource.

    SEPARATE TRACKING OF PENDING LOADS AND STORES

    公开(公告)号:US20180246724A1

    公开(公告)日:2018-08-30

    申请号:US15442412

    申请日:2017-02-24

    Abstract: Systems, apparatuses, and methods for maintaining separate pending load and store counters are disclosed herein. In one embodiment, a system includes at least one execution unit, a memory subsystem, and a pair of counters for each thread of execution. In one embodiment, the system implements a software based approach for managing dependencies between instructions. In one embodiment, the execution unit(s) maintains counters to support the software-based approach for managing dependencies between instructions. The execution unit(s) are configured to execute instructions that are used to manage the dependencies during run-time. In one embodiment, the execution unit(s) execute wait instructions to wait until a given counter is equal to a specified value before continuing to execute the instruction sequence.

Patent Agency Ranking