Patent search ap:("ADVANCED MICRO DEVICES Page INC.") AND inv:"Bradford Michael Beckmann"

11.

发明申请
RESOURCE-AWARE COMPRESSION 有权

公开(公告)号：US20230110376A1

公开(公告)日：2023-04-13

申请号：US18058534

申请日：2022-11-23

Applicant: Advanced Micro Devices, Inc.

Inventor： SeyedMohammad SeyedzadehDelcheh , Shomit N. Das , Bradford Michael Beckmann

IPC: G06F12/0871 , G06F11/30 , G06F12/0897 , G06F12/02

Abstract: Systems, apparatuses, and methods for implementing a multi-tiered approach to cache compression are disclosed. A cache includes a cache controller, light compressor, and heavy compressor. The decision on which compressor to use for compressing cache lines is made based on certain resource availability such as cache capacity or memory bandwidth. This allows the cache to opportunistically use complex algorithms for compression while limiting the adverse effects of high decompression latency on system performance. To address the above issue, the proposed design takes advantage of the heavy compressors for effectively reducing memory bandwidth in high bandwidth memory (HBM) interfaces as long as they do not sacrifice system performance. Accordingly, the cache combines light and heavy compressors with a decision-making unit to achieve reduced off-chip memory traffic without sacrificing system performance.

12.

发明授权
Continuation analysis tasks for GPU task scheduling 有权

公开(公告)号：US11544106B2

公开(公告)日：2023-01-03

申请号：US16846654

申请日：2020-04-13

Applicant: Advanced Micro Devices, Inc.

Inventor： Steven Tony Tye , Brian L. Sumner , Bradford Michael Beckmann , Sooraj Puthoor

IPC: G06F9/48 , G06F9/38 , G06F9/50 , G06F9/52

Abstract: Systems, apparatuses, and methods for implementing continuation analysis tasks (CATs) are disclosed. In one embodiment, a system implements hardware acceleration of CATs to manage the dependencies and scheduling of an application composed of multiple tasks. In one embodiment, a continuation packet is referenced directly by a first task. When the first task completes, the first task enqueues a continuation packet on a first queue. The first task can specify on which queue to place the continuation packet. The agent responsible for the first queue dequeues and executes the continuation packet which invokes an analysis phase which is performed prior to determining which dependent tasks to enqueue. If it is determined during the analysis phase that a second task is now ready to be launched, the second task is enqueued on one of the queues. Then, an agent responsible for this queue dequeues and executes the second task.

13.

发明申请
Management of Thrashing in a GPU 有权

公开(公告)号：US20220206876A1

公开(公告)日：2022-06-30

申请号：US17136738

申请日：2020-12-29

Applicant: Advanced Micro Devices, Inc.

Inventor： Bradford Michael Beckmann , Steven Tony Tye , Brian L. Sumner , Nicolai Hähnle

IPC: G06F9/52 , G06F9/30 , G06T1/20

Abstract: Systems, apparatuses, and methods for managing a number of wavefronts permitted to concurrently execute in a processing system. An apparatus includes a register file with a plurality of registers and a plurality of compute units configured to execute wavefronts. A control unit of the apparatus is configured to allow a first number of wavefronts to execute concurrently on the plurality of compute units. The control unit is configured to allow no more than a second number of wavefronts to execute concurrently on the plurality of compute units, wherein the second number is less than the first number, in response to detection that thrashing of the register file is above a threshold. The control unit is configured to detect said thrashing based at least in part on a number of registers in use by executing wavefronts that spill to memory

14.

发明申请
MEMORY REQUEST PRIORITY ASSIGNMENT TECHNIQUES FOR PARALLEL PROCESSORS 有权

公开(公告)号：US20210173796A1

公开(公告)日：2021-06-10

申请号：US16706421

申请日：2019-12-06

Applicant: Advanced Micro Devices, Inc.

Inventor： Sooraj Puthoor , Kishore Punniyamurthy , Onur Kayiran , Xianwei Zhang , Yasuko Eckert , Johnathan Alsop , Bradford Michael Beckmann

IPC: G06F13/18 , G06F13/16

Abstract: Systems, apparatuses, and methods for implementing memory request priority assignment techniques for parallel processors are disclosed. A system includes at least a parallel processor coupled to a memory subsystem, where the parallel processor includes at least a plurality of compute units for executing wavefronts in lock-step. The parallel processor assigns priorities to memory requests of wavefronts on a per-work-item basis by indexing into a first priority vector, with the index generated based on lane-specific information. If a given event is detected, a second priority vector is generated by applying a given priority promotion vector to the first priority vector. Then, for subsequent wavefronts, memory requests are assigned priorities by indexing into the second priority vector with lane-specific information. The use of priority vectors to assign priorities to memory requests helps to reduce the memory divergence problem experienced by different work-items of a wavefront.

15.

发明申请
Low Latency Offloading of Collectives over a Switch 有权

公开(公告)号：US20250077409A1

公开(公告)日：2025-03-06

申请号：US18240640

申请日：2023-08-31

Applicant: Advanced Micro Devices, Inc , ATI Technologies ULC

Inventor： Kishore Punniyamurthy , Richard David Sodke , Furkan Eris , Sergey Blagodurov , Bradford Michael Beckmann , Brandon Keith Potter , Khaled Hamidouche

IPC: G06F12/02 , G06F13/16

Abstract: A device includes a plurality of processing elements (PEs). A symmetric memory is allocated in each of the plurality of PEs. The device includes a switch connected to the plurality of PEs. The switch is to: receive, from a first processing element (PE) of the plurality of PEs, a message that includes a buffer offset, compute, based on the buffer offset, a first memory address of a first buffer in a first symmetric memory of the first PE and a second memory address of a second buffer in a second symmetric memory of a second PE of the plurality of PEs, and initiate, based on the first memory address and the second memory address, a direct memory access operation to access the first buffer and the second buffer.

16.

发明授权
Multi-kernel wavefront scheduler 有权

公开(公告)号：US12099867B2

公开(公告)日：2024-09-24

申请号：US15993061

申请日：2018-05-30

Applicant: Advanced Micro Devices, Inc.

Inventor： Sooraj Puthoor , Joseph Gross , Xulong Tang , Bradford Michael Beckmann

IPC: G06F9/46 , G06F9/48

CPC classification number: G06F9/4881

Abstract: Systems, apparatuses, and methods for implementing a multi-kernel wavefront scheduler are disclosed. A system includes at least a parallel processor coupled to one or more memories, wherein the parallel processor includes a command processor and a plurality of compute units. The command processor launches multiple kernels for execution on the compute units. Each compute unit includes a multi-level scheduler for scheduling wavefronts from multiple kernels for execution on its execution units. A first level scheduler creates scheduling groups by grouping together wavefronts based on the priority of their kernels. Accordingly, wavefronts from kernels with the same priority are grouped together in the same scheduling group by the first level scheduler. Next, the first level scheduler selects, from a plurality of scheduling groups, the highest priority scheduling group for execution. Then, a second level scheduler schedules wavefronts for execution from the scheduling group selected by the first level scheduler.

17.

发明公开
Permute Instructions for Register-Based Lookups 审中-公开

公开(公告)号：US20240220247A1

公开(公告)日：2024-07-04

申请号：US18148873

申请日：2022-12-30

Applicant: Advanced Micro Devices, Inc.

Inventor： Vadim Vadimovich Nikiforov , Yasuko Eckert , Bradford Michael Beckmann

IPC: G06F9/30

CPC classification number: G06F9/30127 , G06F9/30134

Abstract: Permute instructions for register-based lookups is described. In accordance with the described techniques, a processor is configured to perform a register-based lookup by retrieving a first result from a first lookup table based on a subset of bits included in an index of a destination register, retrieving a second result from a second lookup table based on the subset of bits included in the index of the destination register, selecting the first result or the second result based on a bit in the index of the destination register that is excluded from the subset of bits, and overwriting data included in the index of the destination register using a selected one of the first result or the second result.

18.

发明授权
Management of thrashing in a GPU 有权

公开(公告)号：US11875197B2

公开(公告)日：2024-01-16

申请号：US17136738

申请日：2020-12-29

Applicant: Advanced Micro Devices, Inc.

Inventor： Bradford Michael Beckmann , Steven Tony Tye , Brian L. Sumner , Nicolai Hähnle

IPC: G06F9/38 , G06F9/52 , G06T1/20 , G06F9/30

CPC classification number: G06F9/52 , G06F9/30141 , G06F9/3836 , G06T1/20

Abstract: Systems, apparatuses, and methods for managing a number of wavefronts permitted to concurrently execute in a processing system. An apparatus includes a register file with a plurality of registers and a plurality of compute units configured to execute wavefronts. A control unit of the apparatus is configured to allow a first number of wavefronts to execute concurrently on the plurality of compute units. The control unit is configured to allow no more than a second number of wavefronts to execute concurrently on the plurality of compute units, wherein the second number is less than the first number, in response to detection that thrashing of the register file is above a threshold. The control unit is configured to detect said thrashing based at least in part on a number of registers in use by executing wavefronts that spill to memory.

19.

发明授权
Resource-aware compression 有权

公开(公告)号：US11544196B2

公开(公告)日：2023-01-03

申请号：US16725971

申请日：2019-12-23

Applicant: Advanced Micro Devices, Inc.

Inventor： SeyedMohammad SeyedzadehDelcheh , Shomit N. Das , Bradford Michael Beckmann

IPC: G06F12/08 , G06F12/0871 , G06F12/0897 , G06F11/30 , G06F12/02

Abstract: Systems, apparatuses, and methods for implementing a multi-tiered approach to cache compression are disclosed. A cache includes a cache controller, light compressor, and heavy compressor. The decision on which compressor to use for compressing cache lines is made based on certain resource availability such as cache capacity or memory bandwidth. This allows the cache to opportunistically use complex algorithms for compression while limiting the adverse effects of high decompression latency on system performance. To address the above issue, the proposed design takes advantage of the heavy compressors for effectively reducing memory bandwidth in high bandwidth memory (HBM) interfaces as long as they do not sacrifice system performance. Accordingly, the cache combines light and heavy compressors with a decision-making unit to achieve reduced off-chip memory traffic without sacrificing system performance.

20.

发明授权
Memory request priority assignment techniques for parallel processors 有权

公开(公告)号：US11507522B2

公开(公告)日：2022-11-22

申请号：US16706421

申请日：2019-12-06

Applicant: Advanced Micro Devices, Inc.

Inventor： Sooraj Puthoor , Kishore Punniyamurthy , Onur Kayiran , Xianwei Zhang , Yasuko Eckert , Johnathan Alsop , Bradford Michael Beckmann

IPC: G06F13/18 , G06F13/16

Abstract: Systems, apparatuses, and methods for implementing memory request priority assignment techniques for parallel processors are disclosed. A system includes at least a parallel processor coupled to a memory subsystem, where the parallel processor includes at least a plurality of compute units for executing wavefronts in lock-step. The parallel processor assigns priorities to memory requests of wavefronts on a per-work-item basis by indexing into a first priority vector, with the index generated based on lane-specific information. If a given event is detected, a second priority vector is generated by applying a given priority promotion vector to the first priority vector. Then, for subsequent wavefronts, memory requests are assigned priorities by indexing into the second priority vector with lane-specific information. The use of priority vectors to assign priorities to memory requests helps to reduce the memory divergence problem experienced by different work-items of a wavefront.

Search Results

Country/Region

Patent validity

Application date

Publication (announcement) day

applicant

The country/region where the applicant is located

Inventor

IPC

IPC Department

IPC class

IPC subclass

IPC group

IPC team

Appearance classification