-
公开(公告)号:US12131199B2
公开(公告)日:2024-10-29
申请号:US17029935
申请日:2020-09-23
Applicant: ADVANCED MICRO DEVICES, INC.
Inventor: Alexandru Dutu , Matthew David Sinclair , Bradford Beckmann , David A. Wood
CPC classification number: G06F9/522 , G06F9/3005 , G06F9/461 , G06F11/3024 , G06F11/3476 , G06F11/3495 , G06N20/00
Abstract: A processing system monitors and synchronizes parallel execution of workgroups (WGs). One or more of the WGs perform (e.g., periodically or in response to a trigger such as an indication of oversubscription) a waiting atomic instruction. In response to a comparison between an atomic value produced as a result of the waiting atomic instruction and an expected value, WGs that fail to produce a correct atomic value are identified as being in a waiting state (e.g., waiting for a synchronization variable). Execution of WGs in the waiting state is prevented (e.g., by a context switch) until corresponding synchronization variables are released.
-
公开(公告)号:US20210096909A1
公开(公告)日:2021-04-01
申请号:US16588872
申请日:2019-09-30
Applicant: Advanced Micro Devices, Inc.
Inventor: Alexandru Dutu , Matthew D. Sinclair , Bradford M. Beckmann , David A. Wood
Abstract: A technique for synchronizing workgroups is provided. The techniques comprise detecting that one or more non-executing workgroups are ready to execute, placing the one or more non-executing workgroups into one or more ready queues based on the synchronization status of the one or more workgroups, detecting that computing resources are available for execution of one or more ready workgroups, and scheduling for execution one or more ready workgroups from the one or more ready queues in an order that is based on the relative priority of the ready queues.
-
公开(公告)号:US10360652B2
公开(公告)日:2019-07-23
申请号:US14304483
申请日:2014-06-13
Applicant: Advanced Micro Devices, Inc.
Inventor: Marc S. Orr , Bradford M. Beckmann , Benedict R. Gaster , Steven K. Reinhardt , David A. Wood
IPC: G06T1/20
Abstract: A processor comprising hardware logic configured to execute of a first wavefront in a hardware resource and stop execution of the first wavefront before the first wavefront completes. The processor schedules a second wavefront for execution in the hardware resource.
-
公开(公告)号:US20160357551A1
公开(公告)日:2016-12-08
申请号:US14728643
申请日:2015-06-02
Applicant: Advanced Micro Devices, Inc.
Inventor: David A. Wood , Steven K. Reinhardt , Bradford M. Beckmann , Marc S. Orr
IPC: G06F9/30
CPC classification number: G06F9/3004 , G06F9/30072 , G06F9/30087 , G06F9/345 , G06F9/3851 , G06F9/52 , G06F9/526
Abstract: A conditional fetch-and-phi operation tests a memory location to determine if the memory locations stores a specified value and, if so, modifies the value at the memory location. The conditional fetch-and-phi operation can be implemented so that it can be concurrently executed by a plurality of concurrently executing threads, such as the threads of wavefront at a GPU. To execute the conditional fetch-and-phi operation, one of the concurrently executing threads is selected to execute a compare-and-swap (CAS) operation at the memory location, while the other threads await the results. The CAS operation tests the value at the memory location and, if the CAS operation is successful, the value is passed to each of the concurrently executing threads.
Abstract translation: 条件获取和操作操作测试存储器位置以确定存储器位置是否存储指定的值,如果是,则修改存储器位置处的值。 可以实现条件获取和操作操作,使得其可以由多个并发执行的线程(诸如GPU处的波阵面的线程)同时执行。 为了执行条件提取和操作操作,选择并发执行的线程之一,以在存储器位置执行比较和交换(CAS)操作,而其他线程等待结果。 CAS操作测试内存位置的值,如果CAS操作成功,则将该值传递给每个并发执行的线程。
-
公开(公告)号:US12190174B2
公开(公告)日:2025-01-07
申请号:US16425881
申请日:2019-05-29
Applicant: Advanced Micro Devices, Inc.
Inventor: Alexandru Dutu , Sergey Blagodurov , Anthony T. Gutierrez , Matthew D. Sinclair , David A. Wood , Bradford M. Beckmann
Abstract: A technique for synchronizing workgroups is provided. Multiple workgroups execute a wait instruction that specifies a condition variable and a condition. A workgroup scheduler stops execution of a workgroup that executes a wait instruction and an advanced controller begins monitoring the condition variable. In response to the advanced controller detecting that the condition is met, the workgroup scheduler determines whether there is a high contention scenario, which occurs when the wait instruction is part of a mutual exclusion synchronization primitive and is detected by determining that there is a low number of updates to the condition variable prior to detecting that the condition has been met. In a high contention scenario, the workgroup scheduler wakes up one workgroup and schedules another workgroup to be woken up at a time in the future. In a non-contention scenario, more than one workgroup can be woken up at the same time.
-
公开(公告)号:US11604738B2
公开(公告)日:2023-03-14
申请号:US16146543
申请日:2018-09-28
Applicant: Advanced Micro Devices, Inc.
Inventor: Shomit N. Das , Matthew Tomei , David A. Wood
IPC: G06F12/0897 , G06F12/0811
Abstract: A processing device is provided which includes memory comprising data cache memory configured to store compressed data and metadata cache memory configured to store metadata, each portion of metadata comprising an encoding used to compress a portion of data. The processing device also includes at least one processor configured to compress portions of data and select, based on one or more utility level metrics, portions of metadata to be stored in the metadata cache memory. The at least one processor is also configured to store, in the metadata cache memory, the portions of metadata selected to be stored in the metadata cache memory, store, in the data cache memory, each portion of compressed data having a selected portion of corresponding metadata stored in the metadata cache memory. Each portion of compressed data, having the selected portion of corresponding metadata stored in the metadata cache memory, is decompressed.
-
公开(公告)号:US10860489B2
公开(公告)日:2020-12-08
申请号:US16176828
申请日:2018-10-31
Applicant: Advanced Micro Devices, Inc.
Inventor: Shomit N. Das , Matthew Tomei , David A. Wood
IPC: H03M7/00 , G06F12/0871 , H03M7/30 , G06F30/00
Abstract: Techniques are disclosed for designing cache compression algorithms that control how data in caches are compressed. The techniques generate a custom “byte select algorithm” by applying repeated transforms applied to an initial compression algorithm until a set of suitability criteria is met. The suitability criteria include that the “cost” is below a threshold and that a metadata constraint is met. The “cost” is the number of blocks that can be compressed by an algorithm as compared with the “ideal” algorithm. The metadata constraint is the number of bits required for metadata.
-
公开(公告)号:US09804883B2
公开(公告)日:2017-10-31
申请号:US14542042
申请日:2014-11-14
Applicant: Advanced Micro Devices, Inc.
Inventor: Marc S. Orr , Bradford M. Beckmann , Ayse Yilmazer , Shuai Che , David A. Wood , Mark D. Hill
IPC: G06F9/46 , G06F12/0806 , G06F3/06 , G06F12/0871
CPC classification number: G06F9/46 , G06F3/06 , G06F12/0806 , G06F12/0871 , G06F2212/222 , G06F2212/313 , G06F2212/401
Abstract: Described herein is an apparatus and method for remote scoped synchronization, which is a new semantic that allows a work-item to order memory accesses with a scope instance outside of its scope hierarchy. More precisely, remote synchronization expands visibility at a particular scope to all scope-instances encompassed by that scope. Remote scoped synchronization operation allows smaller scopes to be used more frequently and defers added cost to only when larger scoped synchronization is required. This enables programmers to optimize the scope that memory operations are performed at for important communication patterns like work stealing. Executing memory operations at the optimum scope reduces both execution time and energy. In particular, remote synchronization allows a work-item to communicate with a scope that it otherwise would not be able to access. Specifically, work-items can pull valid data from and push updates to scopes that do not (hierarchically) contain them.
-
公开(公告)号:US11809902B2
公开(公告)日:2023-11-07
申请号:US17031424
申请日:2020-09-24
Applicant: Advanced Micro Devices, Inc.
Inventor: Alexandru Dutu , Marcus Nathaniel Chow , Matthew D. Sinclair , Bradford M. Beckmann , David A. Wood
CPC classification number: G06F9/4881 , G06F9/3838 , G06F9/545
Abstract: Techniques for executing workgroups are provided. The techniques include executing, for a first workgroup of a first kernel dispatch, a workgroup dependency instruction that includes an indication to prioritize execution of a second workgroup of a second kernel dispatch, and in response to the workgroup dependency instruction, dispatching the second workgroup of the second kernel dispatch prior to dispatching a third workgroup of the second kernel dispatch, wherein no workgroup dependency instruction including an indication to prioritize execution of the third workgroup has been executed.
-
10.
公开(公告)号:US11481250B2
公开(公告)日:2022-10-25
申请号:US16024244
申请日:2018-06-29
Applicant: ADVANCED MICRO DEVICES, INC.
Inventor: Alexandru Dutu , Matthew David Sinclair , Bradford Beckmann , David A. Wood
Abstract: A first workgroup is preempted in response to threads in the first workgroup executing a first wait instruction including a first value of a signal and a first hint indicating a type of modification for the signal. The first workgroup is scheduled for execution on a processor core based on a first context after preemption in response to the signal having the first value. A second workgroup is scheduled for execution on the processor core based on a second context in response to preempting the first workgroup and in response to the signal having a second value. A third context it is prefetched into registers of the processor core based on the first hint and the second value. The first context is stored in a first portion of the registers and the second context is prefetched into a second portion of the registers prior to preempting the first workgroup.
-
-
-
-
-
-
-
-
-