ENHANCED ATOMICS FOR WORKGROUP SYNCHRONIZATION

    公开(公告)号:US20210096909A1

    公开(公告)日:2021-04-01

    申请号:US16588872

    申请日:2019-09-30

    Abstract: A technique for synchronizing workgroups is provided. The techniques comprise detecting that one or more non-executing workgroups are ready to execute, placing the one or more non-executing workgroups into one or more ready queues based on the synchronization status of the one or more workgroups, detecting that computing resources are available for execution of one or more ready workgroups, and scheduling for execution one or more ready workgroups from the one or more ready queues in an order that is based on the relative priority of the ready queues.

    CONDITIONAL ATOMIC OPERATIONS AT A PROCESSOR
    4.
    发明申请
    CONDITIONAL ATOMIC OPERATIONS AT A PROCESSOR 审中-公开
    处理者的条件原子操作

    公开(公告)号:US20160357551A1

    公开(公告)日:2016-12-08

    申请号:US14728643

    申请日:2015-06-02

    Abstract: A conditional fetch-and-phi operation tests a memory location to determine if the memory locations stores a specified value and, if so, modifies the value at the memory location. The conditional fetch-and-phi operation can be implemented so that it can be concurrently executed by a plurality of concurrently executing threads, such as the threads of wavefront at a GPU. To execute the conditional fetch-and-phi operation, one of the concurrently executing threads is selected to execute a compare-and-swap (CAS) operation at the memory location, while the other threads await the results. The CAS operation tests the value at the memory location and, if the CAS operation is successful, the value is passed to each of the concurrently executing threads.

    Abstract translation: 条件获取和操作操作测试存储器位置以确定存储器位置是否存储指定的值,如果是,则修改存储器位置处的值。 可以实现条件获取和操作操作,使得其可以由多个并发执行的线程(诸如GPU处的波阵面的线程)同时执行。 为了执行条件提取和操作操作,选择并发执行的线程之一,以在存储器位置执行比较和交换(CAS)操作,而其他线程等待结果。 CAS操作测试内存位置的值,如果CAS操作成功,则将该值传递给每个并发执行的线程。

    Device and method for data compression using a metadata cache

    公开(公告)号:US11604738B2

    公开(公告)日:2023-03-14

    申请号:US16146543

    申请日:2018-09-28

    Abstract: A processing device is provided which includes memory comprising data cache memory configured to store compressed data and metadata cache memory configured to store metadata, each portion of metadata comprising an encoding used to compress a portion of data. The processing device also includes at least one processor configured to compress portions of data and select, based on one or more utility level metrics, portions of metadata to be stored in the metadata cache memory. The at least one processor is also configured to store, in the metadata cache memory, the portions of metadata selected to be stored in the metadata cache memory, store, in the data cache memory, each portion of compressed data having a selected portion of corresponding metadata stored in the metadata cache memory. Each portion of compressed data, having the selected portion of corresponding metadata stored in the metadata cache memory, is decompressed.

    Byte select cache compression
    7.
    发明授权

    公开(公告)号:US10860489B2

    公开(公告)日:2020-12-08

    申请号:US16176828

    申请日:2018-10-31

    Abstract: Techniques are disclosed for designing cache compression algorithms that control how data in caches are compressed. The techniques generate a custom “byte select algorithm” by applying repeated transforms applied to an initial compression algorithm until a set of suitability criteria is met. The suitability criteria include that the “cost” is below a threshold and that a metadata constraint is met. The “cost” is the number of blocks that can be compressed by an algorithm as compared with the “ideal” algorithm. The metadata constraint is the number of bits required for metadata.

    Cooperative workgroup scheduling and context prefetching based on predicted modification of signal values

    公开(公告)号:US11481250B2

    公开(公告)日:2022-10-25

    申请号:US16024244

    申请日:2018-06-29

    Abstract: A first workgroup is preempted in response to threads in the first workgroup executing a first wait instruction including a first value of a signal and a first hint indicating a type of modification for the signal. The first workgroup is scheduled for execution on a processor core based on a first context after preemption in response to the signal having the first value. A second workgroup is scheduled for execution on the processor core based on a second context in response to preempting the first workgroup and in response to the signal having a second value. A third context it is prefetched into registers of the processor core based on the first hint and the second value. The first context is stored in a first portion of the registers and the second context is prefetched into a second portion of the registers prior to preempting the first workgroup.

Patent Agency Ranking