Patent search ap:("Advanced Micro Devices Page Inc.") AND inv:"David A. Wood"

11.

发明申请
FINE-GRAINED CONDITIONAL DISPATCHING 有权

公开(公告)号：US20220091880A1

公开(公告)日：2022-03-24

申请号：US17031424

申请日：2020-09-24

Applicant: Advanced Micro Devices, Inc.

Inventor： Alexandru Dutu , Marcus Nathaniel Chow , Matthew D. Sinclair , Bradford M. Beckmann , David A. Wood

IPC: G06F9/48 , G06F9/38 , G06F9/54

Abstract: Techniques for executing workgroups are provided. The techniques include executing, for a first workgroup of a first kernel dispatch, a workgroup dependency instruction that includes an indication to prioritize execution of a second workgroup of a second kernel dispatch, and in response to the workgroup dependency instruction, dispatching the second workgroup of the second kernel dispatch prior to dispatching a third workgroup of the second kernel dispatch, wherein no workgroup dependency instruction including an indication to prioritize execution of the third workgroup has been executed.

12.

发明申请
PATTERN-BASED CACHE BLOCK COMPRESSION 有权

公开(公告)号：US20210157485A1

公开(公告)日：2021-05-27

申请号：US17029158

申请日：2020-09-23

Applicant: Advanced Micro Devices, Inc.

Inventor： Matthew Tomei , Shomit N. Das , David A. Wood

IPC: G06F3/06 , G06F12/0802

Abstract: Systems, methods, and devices for performing pattern-based cache block compression and decompression. An uncompressed cache block is input to the compressor. Byte values are identified within the uncompressed cache block. A cache block pattern is searched for in a set of cache block patterns based on the byte values. A compressed cache block is output based on the byte values and the cache block pattern. A compressed cache block is input to the decompressor. A cache block pattern is identified based on metadata of the cache block. The cache block pattern is applied to a byte dictionary of the cache block. An uncompressed cache block is output based on the cache block pattern and the byte dictionary. A subset of cache block patterns is determined from a training cache trace based on a set of compressed sizes and a target number of patterns for each size.

13.

发明授权
Message aggregation, combining and compression for efficient data communications in GPU-based clusters 有权

公开(公告)号：US10320695B2

公开(公告)日：2019-06-11

申请号：US15165953

申请日：2016-05-26

Applicant: Advanced Micro Devices, Inc.

Inventor： Steven K. Reinhardt , Marc S. Orr , Bradford M. Beckmann , Shuai Che , David A. Wood

IPC: G06F15/173 , H04L12/805 , H04L12/811

Abstract: A system and method for efficient management of network traffic management of highly data parallel computing. A processing node includes one or more processors capable of generating network messages. A network interface is used to receive and send network messages across a network. The processing node reduces at least one of a number or a storage size of the original network messages into one or more new network messages. The new network messages are sent to the network interface to send across the network.

14.

发明公开
FINE-GRAINED CONDITIONAL DISPATCHING 审中-公开

公开(公告)号：US20240045718A1

公开(公告)日：2024-02-08

申请号：US18488794

申请日：2023-10-17

Applicant: Advanced Micro Devices, Inc.

Inventor： Alexandru Dutu , Marcus Nathaniel Chow , Matthew D. Sinclair , Bradford M. Beckmann , David A. Wood

IPC: G06F9/48 , G06F9/54 , G06F9/38

CPC classification number: G06F9/4881 , G06F9/545 , G06F9/3838

Abstract: Techniques for executing workgroups are provided. The techniques include executing, for a first workgroup of a first kernel dispatch, a workgroup dependency instruction that includes an indication to prioritize execution of a second workgroup of a second kernel dispatch, and in response to the workgroup dependency instruction, dispatching the second workgroup of the second kernel dispatch prior to dispatching a third workgroup of the second kernel dispatch, wherein no workgroup dependency instruction including an indication to prioritize execution of the third workgroup has been executed.

15.

发明申请
SYNCHRONIZATION MECHANISM FOR WORKGROUPS 审中-公开

公开(公告)号：US20200379820A1

公开(公告)日：2020-12-03

申请号：US16425881

申请日：2019-05-29

Applicant: Advanced Micro Devices, Inc.

Inventor： Alexandru Dutu , Sergey Blagodurov , Anthony T. Gutierrez , Matthew D. Sinclair , David A. Wood , Bradford M. Beckmann

IPC: G06F9/52 , G06F9/48 , G06F11/30 , G06F11/34

Abstract: A technique for synchronizing workgroups is provided. Multiple workgroups execute a wait instruction that specifies a condition variable and a condition. A workgroup scheduler stops execution of a workgroup that executes a wait instruction and an advanced controller begins monitoring the condition variable. In response to the advanced controller detecting that the condition is met, the workgroup scheduler determines whether there is a high contention scenario, which occurs when the wait instruction is part of a mutual exclusion synchronization primitive and is detected by determining that there is a low number of updates to the condition variable prior to detecting that the condition has been met. In a high contention scenario, the workgroup scheduler wakes up one workgroup and schedules another workgroup to be woken up at a time in the future. In a non-contention scenario, more than one workgroup can be woken up at the same time.

16.

发明申请
BYTE SELECT CACHE COMPRESSION 审中-公开

公开(公告)号：US20200133866A1

公开(公告)日：2020-04-30

申请号：US16176828

申请日：2018-10-31

Applicant: Advanced Micro Devices, Inc.

Inventor： Shomit N. Das , Matthew Tomei , David A. Wood

IPC: G06F12/0871 , G06F17/50 , H03M7/30

Abstract: The disclosure herein provides techniques for designing cache compression algorithms that control how data in caches are compressed. The techniques generate a custom “byte select algorithm” by applying repeated transforms applied to an initial compression algorithm until a set of suitability criteria is met. The suitability criteria include that the “cost” is below a threshold and that a metadata constraint is met. The “cost” is the number of blocks that can be compressed by an algorithm as compared with the “ideal” algorithm. The metadata constraint is the number of bits required for metadata.

17.

发明授权
Conditional atomic operations in single instruction multiple data processors 有权

公开(公告)号：US10209990B2

公开(公告)日：2019-02-19

申请号：US14728643

申请日：2015-06-02

Applicant: Advanced Micro Devices, Inc.

Inventor： David A. Wood , Steven K. Reinhardt , Bradford M. Beckmann , Marc S. Orr

IPC: G06F9/52 , G06F9/30 , G06F9/345 , G06F9/38

Abstract: A conditional fetch-and-phi operation tests a memory location to determine if the memory locations stores a specified value and, if so, modifies the value at the memory location. The conditional fetch-and-phi operation can be implemented so that it can be concurrently executed by a plurality of concurrently executing threads, such as the threads of wavefront at a GPU. To execute the conditional fetch-and-phi operation, one of the concurrently executing threads is selected to execute a compare-and-swap (CAS) operation at the memory location, while the other threads await the results. The CAS operation tests the value at the memory location and, if the CAS operation is successful, the value is passed to each of the concurrently executing threads.

18.

发明授权
Pattern-based cache block compression 有权

公开(公告)号：US12001237B2

公开(公告)日：2024-06-04

申请号：US17029158

申请日：2020-09-23

Applicant: Advanced Micro Devices, Inc.

Inventor： Matthew Tomei , Shomit N. Das , David A. Wood

IPC: G06F12/00 , G06F3/06 , G06F12/0802

CPC classification number: G06F3/0608 , G06F3/0655 , G06F3/0676 , G06F3/0679 , G06F12/0802

Abstract: Systems, methods, and devices for performing pattern-based cache block compression and decompression. An uncompressed cache block is input to the compressor. Byte values are identified within the uncompressed cache block. A cache block pattern is searched for in a set of cache block patterns based on the byte values. A compressed cache block is output based on the byte values and the cache block pattern. A compressed cache block is input to the decompressor. A cache block pattern is identified based on metadata of the cache block. The cache block pattern is applied to a byte dictionary of the cache block. An uncompressed cache block is output based on the cache block pattern and the byte dictionary. A subset of cache block patterns is determined from a training cache trace based on a set of compressed sizes and a target number of patterns for each size.

19.

发明授权
Enhanced atomics for workgroup synchronization 有权

公开(公告)号：US11288095B2

公开(公告)日：2022-03-29

申请号：US16588872

申请日：2019-09-30

Applicant: Advanced Micro Devices, Inc.

Inventor： Alexandru Dutu , Matthew D. Sinclair , Bradford M. Beckmann , David A. Wood

IPC: G06F9/46 , G06F9/48 , G06F9/52

Abstract: A technique for synchronizing workgroups is provided. The techniques comprise detecting that one or more non-executing workgroups are ready to execute, placing the one or more non-executing workgroups into one or more ready queues based on the synchronization status of the one or more workgroups, detecting that computing resources are available for execution of one or more ready workgroups, and scheduling for execution one or more ready workgroups from the one or more ready queues in an order that is based on the relative priority of the ready queues.

20.

发明申请
MESSAGE AGGREGATION, COMBINING AND COMPRESSION FOR EFFICIENT DATA COMMUNICATIONS IN GPU-BASED CLUSTERS 审中-公开
Title translation: 基于GPU的群集中的有效数据通信的消息聚合，组合和压缩

公开(公告)号：US20160352598A1

公开(公告)日：2016-12-01

申请号：US15165953

申请日：2016-05-26

Applicant: Advanced Micro Devices, Inc.

Inventor： Steven K. Reinhardt , Marc S. Orr , Bradford M. Beckmann , Shuai Che , David A. Wood

IPC: H04L12/26 , H04L12/24

CPC classification number: H04L47/365

Abstract: A system and method for efficient management of network traffic management of highly data parallel computing. A processing node includes one or more processors capable of generating network messages. A network interface is used to receive and send network messages across a network. The processing node reduces at least one of a number or a storage size of the original network messages into one or more new network messages. The new network messages are sent to the network interface to send across the network.

Abstract translation: 一种高效数据并行计算网络流量管理高效管理的系统和方法。处理节点包括能够生成网络消息的一个或多个处理器。网络接口用于通过网络接收和发送网络消息。处理节点将原始网络消息的数量或存储大小中的至少一个减少到一个或多个新的网络消息中。新的网络消息被发送到网络接口以在网络上发送。

Search Results

Country/Region

Patent validity

Application date

Publication (announcement) day

applicant

The country/region where the applicant is located

Inventor

IPC

IPC Department

IPC class

IPC subclass

IPC group

IPC team

Appearance classification