Resource-aware compression
    1.
    发明授权

    公开(公告)号:US12130741B2

    公开(公告)日:2024-10-29

    申请号:US18058534

    申请日:2022-11-23

    Abstract: Systems, apparatuses, and methods for implementing a multi-tiered approach to cache compression are disclosed. A cache includes a cache controller, light compressor, and heavy compressor. The decision on which compressor to use for compressing cache lines is made based on certain resource availability such as cache capacity or memory bandwidth. This allows the cache to opportunistically use complex algorithms for compression while limiting the adverse effects of high decompression latency on system performance. To address the above issue, the proposed design takes advantage of the heavy compressors for effectively reducing memory bandwidth in high bandwidth memory (HBM) interfaces as long as they do not sacrifice system performance. Accordingly, the cache combines light and heavy compressors with a decision-making unit to achieve reduced off-chip memory traffic without sacrificing system performance.

    DYNAMIC GRAPHICAL PROCESSING UNIT REGISTER ALLOCATION

    公开(公告)号:US20220206841A1

    公开(公告)日:2022-06-30

    申请号:US17136725

    申请日:2020-12-29

    Abstract: Systems, apparatuses, and methods for dynamic graphics processing unit (GPU) register allocation are disclosed. A GPU includes at least a plurality of compute units (CUs), a control unit, and a plurality of registers for each CU. If a new wavefront requests more registers than are currently available on the CU, the control unit spills registers associated with stack frames at the bottom of a stack since they will not likely be used in the near future. The control unit has complete flexibility determining how many registers to spill based on dynamic demands and can prefetch the upcoming necessary fills without software involvement. Effectively, the control unit manages the physical register file as a cache. This allows younger workgroups to be dynamically descheduled so that older workgroups can allocate additional registers when needed to ensure improved fairness and better forward progress guarantees.

    RESOURCE-AWARE COMPRESSION
    3.
    发明申请

    公开(公告)号:US20210191869A1

    公开(公告)日:2021-06-24

    申请号:US16725971

    申请日:2019-12-23

    Abstract: Systems, apparatuses, and methods for implementing a multi-tiered approach to cache compression are disclosed. A cache includes a cache controller, light compressor, and heavy compressor. The decision on which compressor to use for compressing cache lines is made based on certain resource availability such as cache capacity or memory bandwidth. This allows the cache to opportunistically use complex algorithms for compression while limiting the adverse effects of high decompression latency on system performance. To address the above issue, the proposed design takes advantage of the heavy compressors for effectively reducing memory bandwidth in high bandwidth memory (HBM) interfaces as long as they do not sacrifice system performance. Accordingly, the cache combines light and heavy compressors with a decision-making unit to achieve reduced off-chip memory traffic without sacrificing system performance.

    CONTINUATION ANALYSIS TASKS FOR GPU TASK SCHEDULING

    公开(公告)号:US20200379802A1

    公开(公告)日:2020-12-03

    申请号:US16846654

    申请日:2020-04-13

    Abstract: Systems, apparatuses, and methods for implementing continuation analysis tasks (CATs) are disclosed. In one embodiment, a system implements hardware acceleration of CATs to manage the dependencies and scheduling of an application composed of multiple tasks. In one embodiment, a continuation packet is referenced directly by a first task. When the first task completes, the first task enqueues a continuation packet on a first queue. The first task can specify on which queue to place the continuation packet. The agent responsible for the first queue dequeues and executes the continuation packet which invokes an analysis phase which is performed prior to determining which dependent tasks to enqueue. If it is determined during the analysis phase that a second task is now ready to be launched, the second task is enqueued on one of the queues. Then, an agent responsible for this queue dequeues and executes the second task.

    CONTINUATION ANALYSIS TASKS FOR GPU TASK SCHEDULING

    公开(公告)号:US20180349145A1

    公开(公告)日:2018-12-06

    申请号:US15607991

    申请日:2017-05-30

    CPC classification number: G06F9/505 G06F9/5066 G06F2209/509

    Abstract: Systems, apparatuses, and methods for implementing continuation analysis tasks (CATs) are disclosed. In one embodiment, a system implements hardware acceleration of CATs to manage the dependencies and scheduling of an application composed of multiple tasks. In one embodiment, a continuation packet is referenced directly by a first task. When the first task completes, the first task enqueues a continuation packet on a first queue. The first task can specify on which queue to place the continuation packet. The agent responsible for the first queue dequeues and executes the continuation packet which invokes an analysis phase which is performed prior to determining which dependent tasks to enqueue. If it is determined during the analysis phase that a second task is now ready to be launched, the second task is enqueued on one of the queues. Then, an agent responsible for this queue dequeues and executes the second task.

    Executing Kernel Workgroups Across Multiple Compute Unit Types

    公开(公告)号:US20240111591A1

    公开(公告)日:2024-04-04

    申请号:US17957907

    申请日:2022-09-30

    CPC classification number: G06F9/5038 G06F9/3009 G06F9/5072

    Abstract: Portions of programs, oftentimes referred to as kernels, are written by programmers to target a particular type of compute unit, such as a central processing unit (CPU) core or a graphics processing unit (GPU) core. When executing a kernel, the kernel is separated into multiple parts referred to as workgroups, and each workgroup is provided to a compute unit for execution. Usage of one type of compute unit is monitored and, in response to the one type of compute unit being idle, one or more workgroups targeting another type of compute unit are executed on the one type of compute unit. For example, usage of CPU cores is monitored, and in response to the CPU cores being idle, one or more workgroups targeting GPU cores are executed on the CPU cores.

    DYNAMIC GRAPHICAL PROCESSING UNIT REGISTER ALLOCATION

    公开(公告)号:US20230153149A1

    公开(公告)日:2023-05-18

    申请号:US18154012

    申请日:2023-01-12

    Abstract: Systems, apparatuses, and methods for dynamic graphics processing unit (GPU) register allocation are disclosed. A GPU includes at least a plurality of compute units (CUs), a control unit, and a plurality of registers for each CU. If a new wavefront requests more registers than are currently available on the CU, the control unit spills registers associated with stack frames at the bottom of a stack since they will not likely be used in the near future. The control unit has complete flexibility determining how many registers to spill based on dynamic demands and can prefetch the upcoming necessary fills without software involvement. Effectively, the control unit manages the physical register file as a cache. This allows younger workgroups to be dynamically descheduled so that older workgroups can allocate additional registers when needed to ensure improved fairness and better forward progress guarantees.

    Dynamic graphical processing unit register allocation

    公开(公告)号:US11579922B2

    公开(公告)日:2023-02-14

    申请号:US17136725

    申请日:2020-12-29

    Abstract: Systems, apparatuses, and methods for dynamic graphics processing unit (GPU) register allocation are disclosed. A GPU includes at least a plurality of compute units (CUs), a control unit, and a plurality of registers for each CU. If a new wavefront requests more registers than are currently available on the CU, the control unit spills registers associated with stack frames at the bottom of a stack since they will not likely be used in the near future. The control unit has complete flexibility determining how many registers to spill based on dynamic demands and can prefetch the upcoming necessary fills without software involvement. Effectively, the control unit manages the physical register file as a cache. This allows younger workgroups to be dynamically descheduled so that older workgroups can allocate additional registers when needed to ensure improved fairness and better forward progress guarantees.

    Method and Apparatus for Compiler Driven Bank Conflict Avoidance

    公开(公告)号:US20190187964A1

    公开(公告)日:2019-06-20

    申请号:US15848476

    申请日:2017-12-20

    CPC classification number: G06F8/4434 G06F8/433

    Abstract: Systems, apparatuses, and methods for converting computer program source code from a first high level language to a functionally equivalent executable program code. Source code in a first high level language is analyzed by a code compilation tool. In response to identifying a potential bank conflict in a multi-bank register file, operands of one or more instructions are remapped such that they map to different physical banks of the multi-bank register file. Identifying a potential bank conflict comprises one or more of identifying an intra-instruction bank conflict, an inter-instruction bank conflict, and identifying a multi-word operand with a potential bank conflict.

Patent Agency Ranking