-
公开(公告)号:US20240111578A1
公开(公告)日:2024-04-04
申请号:US17957714
申请日:2022-09-30
Applicant: ADVANCED MICRO DEVICES, INC.
Inventor: Matthaeus G. Chajdas , Christopher J. Brennan , Michael Mantor , Robert W. Martin , Nicolai Haehnle
IPC: G06F9/48
CPC classification number: G06F9/4881
Abstract: A method for hierarchical work scheduling includes consuming a work item at a first scheduling domain having a local scheduler circuit and one or more workgroup processing elements. Consuming the work item produces a set of new work items. Subsequently, the local scheduler circuit distributes at least one new work item of the set of new work items to be executed locally at the first scheduling domain. If the local scheduler circuit of the first scheduling domain determines that the set of new work items includes one or more work items that would overload the first scheduling domain with work if scheduled for local execution, those work items are distributed to the next higher-level scheduler circuit in a scheduling domain hierarchy for redistribution to one or more other scheduling domains.
-
公开(公告)号:US11609791B2
公开(公告)日:2023-03-21
申请号:US15828059
申请日:2017-11-30
Applicant: ADVANCED MICRO DEVICES, INC.
Inventor: Anirudh R. Acharya , Michael Mantor
Abstract: A first workload is executed in a first subset of pipelines of a processing unit. A second workload is executed in a second subset of the pipelines of the processing unit. The second workload is dependent upon the first workload. The first and second workloads are suspended and state information for the first and second workloads is stored in a first memory in response to suspending the first and second workloads. In some cases, a third workload executes in a third subset of the pipelines of the processing unit concurrently with executing the first and second workloads. In some cases, a fourth workload is executed in the first and second pipelines after suspending the first and second workloads. The first and second pipelines are resumed on the basis of the stored state information in response to completion or suspension of the fourth workload.
-
公开(公告)号:US20230076872A1
公开(公告)日:2023-03-09
申请号:US17985674
申请日:2022-11-11
Applicant: Advanced Micro Devices, Inc.
Inventor: Nuwan S. Jayasena , James Michael O'Connor , Michael Mantor
IPC: G06F12/0862 , G06F9/52 , G06F8/41
Abstract: Embodiments include methods, systems and non-transitory computer-readable computer readable media including instructions for executing a prefetch kernel that includes memory accesses for prefetching data for a processing kernel into a memory, and, subsequent to executing at least a portion of the prefetch kernel, executing the processing kernel where the processing kernel includes accesses to data that is stored into the memory resulting from execution of the prefetch kernel.
-
公开(公告)号:US20220188076A1
公开(公告)日:2022-06-16
申请号:US17121354
申请日:2020-12-14
Applicant: ADVANCED MICRO DEVICES, INC.
Inventor: Bin He , Brian Emberling , Mark Leather , Michael Mantor
Abstract: A processing system executes wavefronts at multiple arithmetic logic unit (ALU) pipelines of a single instruction multiple data (SIMD) unit in a single execution cycle. The ALU pipelines each include a number of ALUs that execute instructions on wavefront operands that are collected from vector general process register (VGPR) banks at a cache and output results of the instructions executed on the wavefronts at a buffer. By storing wavefronts supplied by the VGPR banks at the cache, a greater number of wavefronts can be made available to the SIMD unit without increasing the VGPR bandwidth, enabling multiple ALU pipelines to execute instructions during a single execution cycle.
-
公开(公告)号:US11226819B2
公开(公告)日:2022-01-18
申请号:US15818304
申请日:2017-11-20
Applicant: ADVANCED MICRO DEVICES, INC.
Inventor: Brian Emberling , Michael Mantor
IPC: G06F9/30 , G06F12/0862 , G06F12/0811
Abstract: A processing unit includes a plurality of processing elements and one or more caches. A first thread executes a program that includes one or more prefetch instructions to prefetch information into a first cache. Prefetching is selectively enabled when executing the first thread on a first processing element dependent upon whether one or more second threads previously executed the program on the first processing element. The first thread is then dispatched to execute the program on the first processing element. In some cases, a dispatcher receives the first thread four dispatching to the first processing element. The dispatcher modifies the prefetch instruction to disable prefetching into the first cache in response to the one or more second threads having previously executed the program on the first processing element.
-
公开(公告)号:US11200060B1
公开(公告)日:2021-12-14
申请号:US17132002
申请日:2020-12-23
Applicant: ADVANCED MICRO DEVICES, INC.
Inventor: Sateesh Lagudu , Arun Vaidyanathan Ananthanarayan , Michael Mantor , Allen H. Rush
Abstract: An array processor includes processor element arrays (PEAs) distributed in rows and columns. The PEAs are configured to perform operations on parameter values. A first sequencer received a first direct memory access (DMA) instruction that includes a request to read data from at least one address in memory. A texture address (TA) engine requests the data from the memory based on the at least one address and a texture data (TD) engine provides the data to the PEAs. The PEAs provide first synchronization signals to the TD engine to indicate availability of registers for receiving the data. The TD engine provides second synchronization signals to the first sequencer in response to receiving acknowledgments that the PEAs have consumed the data.
-
公开(公告)号:US20210117269A1
公开(公告)日:2021-04-22
申请号:US17113815
申请日:2020-12-07
Applicant: Advanced Micro Devices, Inc.
Inventor: John Kalamatianos , Michael Mantor , Sudhanva Gurumurthi
IPC: G06F11/10 , G06F12/0866 , G06F11/16
Abstract: A system and method for protecting memory instructions against faults are described. The system and method include converting the slave instructions to dummy operations, modifying memory arbiter to issue up to N master and N slave global/shared memory instructions per cycle, sending master memory requests to memory system, using slave requests for error checking, entering master requests to the GM/LM FIFO, storing slave requests in a register, and comparing the entered master requests with the stored slave requests.
-
公开(公告)号:US10860418B2
公开(公告)日:2020-12-08
申请号:US16378287
申请日:2019-04-08
Applicant: Advanced Micro Devices, Inc.
Inventor: John Kalamatianos , Michael Mantor , Sudhanva Gurumurthi
IPC: G06F11/10 , G06F11/16 , G06F12/0866 , G06F11/00 , H03M13/00
Abstract: A system and method for protecting memory instructions against faults are described. The system and method include converting the slave instructions to dummy operations, modifying memory arbiter to issue up to N master and N slave global/shared memory instructions per cycle, sending master memory requests to memory system, using slave requests for error checking, entering master requests to the GM/LM FIFO, storing slave requests in a register, and comparing the entered master requests with the stored slave requests.
-
公开(公告)号:US20200210341A1
公开(公告)日:2020-07-02
申请号:US16813075
申请日:2020-03-09
Applicant: Advanced Micro Devices, Inc.
Inventor: Nuwan S. Jayasena , James Michael O'Connor , Michael Mantor
IPC: G06F12/0862 , G06F9/52 , G06F8/41
Abstract: Embodiments include methods, systems and non-transitory computer-readable computer readable media including instructions for executing a prefetch kernel with reduced intermediate state storage resource requirements. These include executing a prefetch kernel on a graphics processing unit (GPU), such that the prefetch kernel begins executing before a processing kernel. The prefetch kernel performs memory operations that are based upon at least a subset of memory operations in the processing kernel.
-
公开(公告)号:US10664942B2
公开(公告)日:2020-05-26
申请号:US15331278
申请日:2016-10-21
Applicant: Advanced Micro Devices, Inc.
Inventor: Timour T. Paltashev , Michael Mantor , Rex Eldon McCrary
Abstract: A graphics processing unit (GPU) includes a plurality of programmable processing cores configured to process graphics primitives and corresponding data and a plurality of fixed-function hardware units. The plurality of processing cores and the plurality of fixed-function hardware units are configured to implement a configurable number of virtual pipelines to concurrently process different command flows. Each virtual pipeline includes a configurable number of fragments and an operational state of each virtual pipeline is specified by a different context. The configurable number of virtual pipelines can be modified from a first number to a second number that is different than the first number. An emulation of a fixed-function hardware unit can be instantiated on one or more of the graphics processing cores in response to detection of a bottleneck in a fixed-function hardware unit. One or more of the virtual pipelines can then be reconfigured to utilize the emulation instead of the fixed-function hardware unit.
-
-
-
-
-
-
-
-
-