专利检索 ap:("ADVANCED MICRO DEVICES, INC." OR "ATI TECNOLOGIES ULC") AND inv:"Milind N. Nemlekar" 第 1 页

1.

发明申请
HARDWARE MANAGEMENT OF DIRECT MEMORY ACCESS COMMANDS 有权

公开(公告)号：US20230132931A1

公开(公告)日：2023-05-04

申请号：US17515976

申请日：2021-11-01

申请人： ADVANCED MICRO DEVICES, INC. , ATI TECNOLOGIES ULC

发明人： Joseph L. Greathouse , Sean Keely , Alan D. Smith , Anthony Asaro , Ling-Ling Wang , Milind N. Nemlekar , Hari Thangirala , Felix Kuehling

IPC分类号： G06F3/06 , G06F13/28

摘要： A method for hardware management of DMA transfer commands includes accessing, by a first DMA engine, a DMA transfer command and determining a first portion of a data transfer requested by the DMA transfer command. Transfer of a first portion of the data transfer by the first DMA engine is initiated based at least in part on the DMA transfer command. Similarly, a second portion of the data transfer by a second DMA engine is initiated based at least in part on the DMA transfer command. After transferring the first portion and the second portion of the data transfer, an indication is generated that signals completion of the data transfer requested by the DMA transfer command.

2.

发明授权
Dynamic modification of coherent atomic memory operations 有权

公开(公告)号：US11604737B1

公开(公告)日：2023-03-14

申请号：US17516860

申请日：2021-11-02

申请人： ADVANCED MICRO DEVICES, INC.

发明人： Joseph L. Greathouse , Steven Tony Tye , Mark Fowler , Milind N. Nemlekar

IPC分类号： G06F12/00 , G06F12/0891 , G06F12/0831 , G06F9/448 , G06F9/30 , G06F12/0888

摘要： A processing device determines a scope indicating at least a portion of the processing system and target data from atomic memory operation to be performed. Based on the scope, the processing device determines one or more hardware parameters for at least a portion of the processing system. The processing device then compares the hardware parameters to the scope and target data to determine one or more corrections. The processing device then provides the scope, target data, hardware parameters, and corrections to a plurality of hardware lookup tables. The hardware lookup tables are configured to receive the scope, target data, hardware parameters, and corrections as inputs and output values indicating one or more coherency actions and one or more orderings. The processing device then executes one or more of the indicated coherency actions and the atomic memory operation based on the indicated ordering.

3.

发明授权
Fused convolution and batch normalization for neural networks 有权

公开(公告)号：US11573765B2

公开(公告)日：2023-02-07

申请号：US16219154

申请日：2018-12-13

申请人： ADVANCED MICRO DEVICES, INC.

发明人： Milind N. Nemlekar , Prerit Dak

IPC分类号： G06N3/04 , G06F5/01 , G06F17/16 , G06N3/08

摘要： A processing unit implements a convolutional neural network (CNN) by fusing at least a portion of a convolution phase of the CNN with at least a portion of a batch normalization phase. The processing unit convolves two input matrices representing inputs and weights of a portion of the CNN to generate an output matrix. The processing unit performs the convolution via a series of multiplication operations, with each multiplication operation generating a corresponding submatrix (or “tile”) of the output matrix at an output register of the processing unit. While an output submatrix is stored at the output register, the processing unit performs a reduction phase and an update phase of the batch normalization phase for the CNN. The processing unit thus fuses at least a portion of the batch normalization phase of the CNN with a portion of the convolution.

4.

发明申请
DYNAMICALLY RECONFIGURABLE REGISTER FILE 有权

公开(公告)号：US20220309606A1

公开(公告)日：2022-09-29

申请号：US17214762

申请日：2021-03-26

申请人： Advanced Micro Devices, Inc.

发明人： Pramod Vasant Argade , Martin G. Sarov , Milind N. Nemlekar

IPC分类号： G06T1/20 , G06F9/30 , G06F9/52

摘要： Techniques for managing register allocation are provided. The techniques include detecting a first request to allocate first registers for a first wavefront; first determining, based on allocation information, that allocating the first registers to the first wavefront would result in a condition in which a deadlock is possible; in response to the first determining, refraining from allocating the first registers to the first wavefront; detecting a second request to allocate second registers for a second wavefront; second determining, based on the allocation information, that allocating the second registers to the second wavefront would result in a condition in which deadlock is not possible; and in response to the second determining, allocating the second registers to the second wavefront.

5.

发明授权
Multi-accelerator compute dispatch 有权

公开(公告)号：US11790590B2

公开(公告)日：2023-10-17

申请号：US17218421

申请日：2021-03-31

申请人： Advanced Micro Devices, Inc.

发明人： Milind N. Nemlekar , Maxim V. Kazakov , Prerit Dak

IPC分类号： G06T15/00 , G06F9/54 , G06T15/80

CPC分类号： G06T15/005 , G06F9/545 , G06T15/80

摘要： Techniques for executing computing work by a plurality of chiplets are provided. The techniques include assigning workgroups of a kernel dispatch packet to the chiplets; by each chiplet, executing the workgroups assigned to that chiplet; for each chiplet, upon completion of all workgroups assigned to that chiplet for the kernel dispatch packet, notifying the other chiplets of such completion; and upon completion of all workgroups of the kernel dispatch packet, notifying a client of such completion and proceeding to a subsequent kernel dispatch packet.

6.

发明公开
RELAXED INVALIDATION FOR CACHE COHERENCE 审中-公开

公开(公告)号：US20230195628A1

公开(公告)日：2023-06-22

申请号：US17558034

申请日：2021-12-21

申请人： ADVANCED MICRO DEVICES, INC.

发明人： Akhil Arunkumar , Tarun Nakra , Maxim V. Kazakov , Milind N. Nemlekar

IPC分类号： G06F12/0811 , G06F12/0853 , G06F13/16

CPC分类号： G06F12/0811 , G06F12/0853 , G06F13/1642 , G06F13/1668

摘要： Methods, systems, and devices maintain state information in a shadow tag memory for a plurality of cachelines in each of a plurality of private caches, with each of the private caches being associated with a corresponding one of multiple processing cores. One or more cache probes are generated based on a write operation associated with one or more cachelines of the plurality of cachelines, such that each of the cache probes is associated with cachelines of a particular private cache of the multiple private caches, the particular private cache being associated with an indicated processing core. Transmission of the cache probes to the particular private cache is prevented until, responsive to a scope acquire operation from the indicated processing core, the cache probes are released for transmission to the respectively associated cachelines in the particular private cache.

7.

发明授权
Pipelined matrix multiplication at a graphics processing unit 有权

公开(公告)号：US11175946B2

公开(公告)日：2021-11-16

申请号：US16211954

申请日：2018-12-06

申请人： ADVANCED MICRO DEVICES, INC.

发明人： Milind N. Nemlekar

IPC分类号： G06F17/16 , G06N3/04 , G06F9/48 , G06T1/20

摘要： A graphics processing unit (GPU) schedules recurrent matrix multiplication operations at different subsets of CUs of the GPU. The GPU includes a scheduler that receives sets of recurrent matrix multiplication operations, such as multiplication operations associated with a recurrent neural network (RNN). The multiple operations associated with, for example, an RNN layer are fused into a single kernel, which is scheduled by the scheduler such that one work group is assigned per compute unit, thus assigning different ones of the recurrent matrix multiplication operations to different subsets of the CUs of the GPU. In addition, via software synchronization of the different workgroups, the GPU pipelines the assigned matrix multiplication operations so that each subset of CUs provides corresponding multiplication results to a different subset, and so that each subset of CUs executes at least a portion of the multiplication operations concurrently.

8.

发明申请
CHIPLET-INTEGRATED MACHINE LEARNING ACCELERATORS 有权

公开(公告)号：US20210026686A1

公开(公告)日：2021-01-28

申请号：US16933863

申请日：2020-07-20

申请人： Advanced Micro Devices, Inc.

发明人： Swapnil P. Sakharshete , Andrew S. Pomianowski , Maxim V. Kazakov , Vineet Goel , Milind N. Nemlekar , Skyler Jonathon Saleh

IPC分类号： G06F9/48 , G06F9/38 , G06F9/30 , G06F12/0893 , G06F12/128 , G06N20/00 , G06F13/28

摘要： Techniques for performing machine learning operations are provided. The techniques include configuring a first portion of a first chiplet as a cache; performing caching operations via the first portion; configuring at least a first sub-portion of the first portion of the chiplet as directly-accessible memory; and performing machine learning operations with the first sub-portion by a machine learning accelerator within the first chiplet.

9.

发明授权
Dynamically reconfigurable register file 有权

公开(公告)号：US12067640B2

公开(公告)日：2024-08-20

申请号：US17214762

申请日：2021-03-26

申请人： Advanced Micro Devices, Inc.

发明人： Pramod Vasant Argade , Martin G. Sarov , Milind N. Nemlekar

IPC分类号： G06T1/20 , G06F9/30 , G06F9/52 , G06T1/60

CPC分类号： G06T1/20 , G06F9/30105 , G06F9/3012 , G06F9/524 , G06T1/60

摘要： Techniques for managing register allocation are provided. The techniques include detecting a first request to allocate first registers for a first wavefront; first determining, based on allocation information, that allocating the first registers to the first wavefront would result in a condition in which a deadlock is possible; in response to the first determining, refraining from allocating the first registers to the first wavefront; detecting a second request to allocate second registers for a second wavefront; second determining, based on the allocation information, that allocating the second registers to the second wavefront would result in a condition in which deadlock is not possible; and in response to the second determining, allocating the second registers to the second wavefront.

10.

发明授权
Relaxed invalidation for cache coherence 有权

公开(公告)号：US11960399B2

公开(公告)日：2024-04-16

申请号：US17558034

申请日：2021-12-21

申请人： ADVANCED MICRO DEVICES, INC.

发明人： Akhil Arunkumar , Tarun Nakra , Maxim V. Kazakov , Milind N. Nemlekar

IPC分类号： G06F12/0811 , G06F12/0853 , G06F13/16

CPC分类号： G06F12/0811 , G06F12/0853 , G06F13/1642 , G06F13/1668

摘要： Methods, systems, and devices maintain state information in a shadow tag memory for a plurality of cachelines in each of a plurality of private caches, with each of the private caches being associated with a corresponding one of multiple processing cores. One or more cache probes are generated based on a write operation associated with one or more cachelines of the plurality of cachelines, such that each of the cache probes is associated with cachelines of a particular private cache of the multiple private caches, the particular private cache being associated with an indicated processing core. Transmission of the cache probes to the particular private cache is prevented until, responsive to a scope acquire operation from the indicated processing core, the cache probes are released for transmission to the respectively associated cachelines in the particular private cache.

搜索结果

国家/区域

专利有效性

申请日

公布(公告)日

申请人

申请人所在国/区域

发明人

IPC

IPC部

IPC大类

IPC小类

IPC大组

IPC小组

外观分类