-
公开(公告)号:US12130741B2
公开(公告)日:2024-10-29
申请号:US18058534
申请日:2022-11-23
Applicant: Advanced Micro Devices, Inc.
IPC: G06F12/02 , G06F11/30 , G06F12/0871 , G06F12/0897
CPC classification number: G06F12/0871 , G06F11/3037 , G06F12/0246 , G06F12/0897 , G06F2212/401
Abstract: Systems, apparatuses, and methods for implementing a multi-tiered approach to cache compression are disclosed. A cache includes a cache controller, light compressor, and heavy compressor. The decision on which compressor to use for compressing cache lines is made based on certain resource availability such as cache capacity or memory bandwidth. This allows the cache to opportunistically use complex algorithms for compression while limiting the adverse effects of high decompression latency on system performance. To address the above issue, the proposed design takes advantage of the heavy compressors for effectively reducing memory bandwidth in high bandwidth memory (HBM) interfaces as long as they do not sacrifice system performance. Accordingly, the cache combines light and heavy compressors with a decision-making unit to achieve reduced off-chip memory traffic without sacrificing system performance.
-
公开(公告)号:US20220206841A1
公开(公告)日:2022-06-30
申请号:US17136725
申请日:2020-12-29
Applicant: Advanced Micro Devices, Inc.
Inventor: Bradford Michael Beckmann , Steven Tony Tye , Brian L. Sumner , Nicolai Hähnle
Abstract: Systems, apparatuses, and methods for dynamic graphics processing unit (GPU) register allocation are disclosed. A GPU includes at least a plurality of compute units (CUs), a control unit, and a plurality of registers for each CU. If a new wavefront requests more registers than are currently available on the CU, the control unit spills registers associated with stack frames at the bottom of a stack since they will not likely be used in the near future. The control unit has complete flexibility determining how many registers to spill based on dynamic demands and can prefetch the upcoming necessary fills without software involvement. Effectively, the control unit manages the physical register file as a cache. This allows younger workgroups to be dynamically descheduled so that older workgroups can allocate additional registers when needed to ensure improved fairness and better forward progress guarantees.
-
公开(公告)号:US20210191869A1
公开(公告)日:2021-06-24
申请号:US16725971
申请日:2019-12-23
Applicant: Advanced Micro Devices, Inc.
IPC: G06F12/0871 , G06F12/0897 , G06F12/02 , G06F11/30
Abstract: Systems, apparatuses, and methods for implementing a multi-tiered approach to cache compression are disclosed. A cache includes a cache controller, light compressor, and heavy compressor. The decision on which compressor to use for compressing cache lines is made based on certain resource availability such as cache capacity or memory bandwidth. This allows the cache to opportunistically use complex algorithms for compression while limiting the adverse effects of high decompression latency on system performance. To address the above issue, the proposed design takes advantage of the heavy compressors for effectively reducing memory bandwidth in high bandwidth memory (HBM) interfaces as long as they do not sacrifice system performance. Accordingly, the cache combines light and heavy compressors with a decision-making unit to achieve reduced off-chip memory traffic without sacrificing system performance.
-
公开(公告)号:US20200379802A1
公开(公告)日:2020-12-03
申请号:US16846654
申请日:2020-04-13
Applicant: Advanced Micro Devices, Inc.
Inventor: Steven Tony Tye , Brian L. Sumner , Bradford Michael Beckmann , Sooraj Puthoor
Abstract: Systems, apparatuses, and methods for implementing continuation analysis tasks (CATs) are disclosed. In one embodiment, a system implements hardware acceleration of CATs to manage the dependencies and scheduling of an application composed of multiple tasks. In one embodiment, a continuation packet is referenced directly by a first task. When the first task completes, the first task enqueues a continuation packet on a first queue. The first task can specify on which queue to place the continuation packet. The agent responsible for the first queue dequeues and executes the continuation packet which invokes an analysis phase which is performed prior to determining which dependent tasks to enqueue. If it is determined during the analysis phase that a second task is now ready to be launched, the second task is enqueued on one of the queues. Then, an agent responsible for this queue dequeues and executes the second task.
-
公开(公告)号:US20180349145A1
公开(公告)日:2018-12-06
申请号:US15607991
申请日:2017-05-30
Applicant: Advanced Micro Devices, Inc.
Inventor: Steven Tony Tye , Brian L. Sumner , Bradford Michael Beckmann , Sooraj Puthoor
CPC classification number: G06F9/505 , G06F9/5066 , G06F2209/509
Abstract: Systems, apparatuses, and methods for implementing continuation analysis tasks (CATs) are disclosed. In one embodiment, a system implements hardware acceleration of CATs to manage the dependencies and scheduling of an application composed of multiple tasks. In one embodiment, a continuation packet is referenced directly by a first task. When the first task completes, the first task enqueues a continuation packet on a first queue. The first task can specify on which queue to place the continuation packet. The agent responsible for the first queue dequeues and executes the continuation packet which invokes an analysis phase which is performed prior to determining which dependent tasks to enqueue. If it is determined during the analysis phase that a second task is now ready to be launched, the second task is enqueued on one of the queues. Then, an agent responsible for this queue dequeues and executes the second task.
-
公开(公告)号:US20240111591A1
公开(公告)日:2024-04-04
申请号:US17957907
申请日:2022-09-30
Applicant: Advanced Micro Devices, Inc.
Inventor: Bradford Michael Beckmann , Sooraj Puthoor
CPC classification number: G06F9/5038 , G06F9/3009 , G06F9/5072
Abstract: Portions of programs, oftentimes referred to as kernels, are written by programmers to target a particular type of compute unit, such as a central processing unit (CPU) core or a graphics processing unit (GPU) core. When executing a kernel, the kernel is separated into multiple parts referred to as workgroups, and each workgroup is provided to a compute unit for execution. Usage of one type of compute unit is monitored and, in response to the one type of compute unit being idle, one or more workgroups targeting another type of compute unit are executed on the one type of compute unit. For example, usage of CPU cores is monitored, and in response to the CPU cores being idle, one or more workgroups targeting GPU cores are executed on the CPU cores.
-
公开(公告)号:US20230153149A1
公开(公告)日:2023-05-18
申请号:US18154012
申请日:2023-01-12
Applicant: Advanced Micro Devices, Inc.
Inventor: Bradford Michael Beckmann , Steven Tony Tye , Brian L. Sumner , Nicolai Hähnle
CPC classification number: G06F9/4843 , G06F9/3836 , G06F15/80 , G06F11/3024 , G06F11/3006 , G06F9/30098
Abstract: Systems, apparatuses, and methods for dynamic graphics processing unit (GPU) register allocation are disclosed. A GPU includes at least a plurality of compute units (CUs), a control unit, and a plurality of registers for each CU. If a new wavefront requests more registers than are currently available on the CU, the control unit spills registers associated with stack frames at the bottom of a stack since they will not likely be used in the near future. The control unit has complete flexibility determining how many registers to spill based on dynamic demands and can prefetch the upcoming necessary fills without software involvement. Effectively, the control unit manages the physical register file as a cache. This allows younger workgroups to be dynamically descheduled so that older workgroups can allocate additional registers when needed to ensure improved fairness and better forward progress guarantees.
-
公开(公告)号:US11579922B2
公开(公告)日:2023-02-14
申请号:US17136725
申请日:2020-12-29
Applicant: Advanced Micro Devices, Inc.
Inventor: Bradford Michael Beckmann , Steven Tony Tye , Brian L. Sumner , Nicolai Hähnle
Abstract: Systems, apparatuses, and methods for dynamic graphics processing unit (GPU) register allocation are disclosed. A GPU includes at least a plurality of compute units (CUs), a control unit, and a plurality of registers for each CU. If a new wavefront requests more registers than are currently available on the CU, the control unit spills registers associated with stack frames at the bottom of a stack since they will not likely be used in the near future. The control unit has complete flexibility determining how many registers to spill based on dynamic demands and can prefetch the upcoming necessary fills without software involvement. Effectively, the control unit manages the physical register file as a cache. This allows younger workgroups to be dynamically descheduled so that older workgroups can allocate additional registers when needed to ensure improved fairness and better forward progress guarantees.
-
公开(公告)号:US20190187964A1
公开(公告)日:2019-06-20
申请号:US15848476
申请日:2017-12-20
Applicant: Advanced Micro Devices, Inc.
IPC: G06F8/41
CPC classification number: G06F8/4434 , G06F8/433
Abstract: Systems, apparatuses, and methods for converting computer program source code from a first high level language to a functionally equivalent executable program code. Source code in a first high level language is analyzed by a code compilation tool. In response to identifying a potential bank conflict in a multi-bank register file, operands of one or more instructions are remapped such that they map to different physical banks of the multi-bank register file. Identifying a potential bank conflict comprises one or more of identifying an intra-instruction bank conflict, an inter-instruction bank conflict, and identifying a multi-word operand with a potential bank conflict.
-
10.
公开(公告)号:US20230315536A1
公开(公告)日:2023-10-05
申请号:US17708021
申请日:2022-03-30
Applicant: ADVANCED MICRO DEVICES, INC.
Inventor: Mark Wyse , Bradford Michael Beckmann , John Kalamatianos , Anthony Thomas Gutierrez
IPC: G06F9/50
CPC classification number: G06F9/5077
Abstract: To reduce inter- and intra-instruction register bank access conflicts in parallel processors, a processing system includes a remapping circuit to dynamically remap virtual registers to physical registers of a parallel processor during execution of a wavefront. The remapping circuit remaps virtual registers to physical registers at a register mapping table that holds the current set of virtual to physical register mappings based on a list of available registers indicating which physical registers are available for a new mapping and a register mapping policy.
-
-
-
-
-
-
-
-
-