CONTINUATION ANALYSIS TASKS FOR GPU TASK SCHEDULING

    公开(公告)号:US20200379802A1

    公开(公告)日:2020-12-03

    申请号:US16846654

    申请日:2020-04-13

    Abstract: Systems, apparatuses, and methods for implementing continuation analysis tasks (CATs) are disclosed. In one embodiment, a system implements hardware acceleration of CATs to manage the dependencies and scheduling of an application composed of multiple tasks. In one embodiment, a continuation packet is referenced directly by a first task. When the first task completes, the first task enqueues a continuation packet on a first queue. The first task can specify on which queue to place the continuation packet. The agent responsible for the first queue dequeues and executes the continuation packet which invokes an analysis phase which is performed prior to determining which dependent tasks to enqueue. If it is determined during the analysis phase that a second task is now ready to be launched, the second task is enqueued on one of the queues. Then, an agent responsible for this queue dequeues and executes the second task.

    CONTINUATION ANALYSIS TASKS FOR GPU TASK SCHEDULING

    公开(公告)号:US20180349145A1

    公开(公告)日:2018-12-06

    申请号:US15607991

    申请日:2017-05-30

    CPC classification number: G06F9/505 G06F9/5066 G06F2209/509

    Abstract: Systems, apparatuses, and methods for implementing continuation analysis tasks (CATs) are disclosed. In one embodiment, a system implements hardware acceleration of CATs to manage the dependencies and scheduling of an application composed of multiple tasks. In one embodiment, a continuation packet is referenced directly by a first task. When the first task completes, the first task enqueues a continuation packet on a first queue. The first task can specify on which queue to place the continuation packet. The agent responsible for the first queue dequeues and executes the continuation packet which invokes an analysis phase which is performed prior to determining which dependent tasks to enqueue. If it is determined during the analysis phase that a second task is now ready to be launched, the second task is enqueued on one of the queues. Then, an agent responsible for this queue dequeues and executes the second task.

    Management of thrashing in a GPU
    13.
    发明授权

    公开(公告)号:US11875197B2

    公开(公告)日:2024-01-16

    申请号:US17136738

    申请日:2020-12-29

    CPC classification number: G06F9/52 G06F9/30141 G06F9/3836 G06T1/20

    Abstract: Systems, apparatuses, and methods for managing a number of wavefronts permitted to concurrently execute in a processing system. An apparatus includes a register file with a plurality of registers and a plurality of compute units configured to execute wavefronts. A control unit of the apparatus is configured to allow a first number of wavefronts to execute concurrently on the plurality of compute units. The control unit is configured to allow no more than a second number of wavefronts to execute concurrently on the plurality of compute units, wherein the second number is less than the first number, in response to detection that thrashing of the register file is above a threshold. The control unit is configured to detect said thrashing based at least in part on a number of registers in use by executing wavefronts that spill to memory.

    Dynamic modification of coherent atomic memory operations

    公开(公告)号:US11604737B1

    公开(公告)日:2023-03-14

    申请号:US17516860

    申请日:2021-11-02

    Abstract: A processing device determines a scope indicating at least a portion of the processing system and target data from atomic memory operation to be performed. Based on the scope, the processing device determines one or more hardware parameters for at least a portion of the processing system. The processing device then compares the hardware parameters to the scope and target data to determine one or more corrections. The processing device then provides the scope, target data, hardware parameters, and corrections to a plurality of hardware lookup tables. The hardware lookup tables are configured to receive the scope, target data, hardware parameters, and corrections as inputs and output values indicating one or more coherency actions and one or more orderings. The processing device then executes one or more of the indicated coherency actions and the atomic memory operation based on the indicated ordering.

    LOADER AND RUNTIME OPERATIONS FOR HETEROGENEOUS CODE OBJECTS

    公开(公告)号:US20220171635A1

    公开(公告)日:2022-06-02

    申请号:US17673647

    申请日:2022-02-16

    Abstract: Described herein are techniques for executing a heterogeneous code object executable. According to the techniques, a loader identifies a first memory appropriate for loading a first architecture-specific portion of the heterogeneous code object executable, wherein the first architecture specific portion includes instructions for a first architecture, identifies a second memory appropriate for loading a second architecture-specific portion of the heterogeneous code object executable, wherein the second architecture specific portion includes instructions for a second architecture that is different than the first architecture, loads the first architecture-specific portion into the first memory and the second architecture-specific portion into the second memory, and performs relocations on the first architecture-specific portion and on the second architecture-specific portion.

Patent Agency Ranking