Patent search ap:("Advanced Micro Devices Page Inc.") AND inv:"Milind N. Nemlekar"

11.

发明申请
MULTI-ACCELERATOR COMPUTE DISPATCH 有权

公开(公告)号：US20220319089A1

公开(公告)日：2022-10-06

申请号：US17218421

申请日：2021-03-31

Applicant: Advanced Micro Devices, Inc.

Inventor： Milind N. Nemlekar , Maxim V. Kazakov , Prerit Dak

IPC: G06T15/00 , G06T15/80 , G06F9/54

Abstract: Techniques for executing computing work by a plurality of chiplets are provided. The techniques include assigning workgroups of a kernel dispatch packet to the chiplets; by each chiplet, executing the workgroups assigned to that chiplet; for each chiplet, upon completion of all workgroups assigned to that chiplet for the kernel dispatch packet, notifying the other chiplets of such completion; and upon completion of all workgroups of the kernel dispatch packet, notifying a client of such completion and proceeding to a subsequent kernel dispatch packet.

12.

发明申请
STACKED DIES FOR MACHINE LEARNING ACCELERATOR 有权

公开(公告)号：US20210374607A1

公开(公告)日：2021-12-02

申请号：US17129739

申请日：2020-12-21

Applicant: Advanced Micro Devices, Inc.

Inventor： Maxim V. Kazakov , Swapnil P. Sakharshete , Milind N. Nemlekar , Vineet Goel

IPC: G06N20/00 , G06F13/40 , G06F13/28 , G06F13/16 , G06T15/00

Abstract: A device is disclosed. The device includes a machine learning die including a memory and one or more machine learning accelerators; and a processing core die stacked with the machine learning die, the processing core die being configured to execute shader programs for controlling operations on the machine learning die, wherein the memory is configurable as either or both of a cache and a directly accessible memory.

13.

发明授权
Dynamically reconfigurable register file 有权

公开(公告)号：US12067640B2

公开(公告)日：2024-08-20

申请号：US17214762

申请日：2021-03-26

Applicant: Advanced Micro Devices, Inc.

Inventor： Pramod Vasant Argade , Martin G. Sarov , Milind N. Nemlekar

IPC: G06T1/20 , G06F9/30 , G06F9/52 , G06T1/60

CPC classification number: G06T1/20 , G06F9/30105 , G06F9/3012 , G06F9/524 , G06T1/60

Abstract: Techniques for managing register allocation are provided. The techniques include detecting a first request to allocate first registers for a first wavefront; first determining, based on allocation information, that allocating the first registers to the first wavefront would result in a condition in which a deadlock is possible; in response to the first determining, refraining from allocating the first registers to the first wavefront; detecting a second request to allocate second registers for a second wavefront; second determining, based on the allocation information, that allocating the second registers to the second wavefront would result in a condition in which deadlock is not possible; and in response to the second determining, allocating the second registers to the second wavefront.

14.

发明授权
Relaxed invalidation for cache coherence 有权

公开(公告)号：US11960399B2

公开(公告)日：2024-04-16

申请号：US17558034

申请日：2021-12-21

Applicant: ADVANCED MICRO DEVICES, INC.

Inventor： Akhil Arunkumar , Tarun Nakra , Maxim V. Kazakov , Milind N. Nemlekar

IPC: G06F12/0811 , G06F12/0853 , G06F13/16

CPC classification number: G06F12/0811 , G06F12/0853 , G06F13/1642 , G06F13/1668

Abstract: Methods, systems, and devices maintain state information in a shadow tag memory for a plurality of cachelines in each of a plurality of private caches, with each of the private caches being associated with a corresponding one of multiple processing cores. One or more cache probes are generated based on a write operation associated with one or more cachelines of the plurality of cachelines, such that each of the cache probes is associated with cachelines of a particular private cache of the multiple private caches, the particular private cache being associated with an indicated processing core. Transmission of the cache probes to the particular private cache is prevented until, responsive to a scope acquire operation from the indicated processing core, the cache probes are released for transmission to the respectively associated cachelines in the particular private cache.

15.

发明公开
MULTI-ACCELERATOR COMPUTE DISPATCH 审中-公开

公开(公告)号：US20240029336A1

公开(公告)日：2024-01-25

申请号：US18480466

申请日：2023-10-03

Applicant: Advanced Micro Devices, Inc.

Inventor： Milind N. Nemlekar , Maxim V. Kazakov , Prerit Dak

IPC: G06T15/00 , G06F9/54 , G06T15/80

CPC classification number: G06T15/005 , G06F9/545 , G06T15/80

Abstract: Techniques for executing computing work by a plurality of chiplets are provided. The techniques include assigning workgroups of a kernel dispatch packet to the chiplets; by each chiplet, executing the workgroups assigned to that chiplet; for each chiplet, upon completion of all workgroups assigned to that chiplet for the kernel dispatch packet, notifying the other chiplets of such completion; and upon completion of all workgroups of the kernel dispatch packet, notifying a client of such completion and proceeding to a subsequent kernel dispatch packet.

16.

发明申请
MACHINE LEARNING CLUSTER PIPELINE FUSION 有权

公开(公告)号：US20230004871A1

公开(公告)日：2023-01-05

申请号：US17364787

申请日：2021-06-30

Applicant: Advanced Micro Devices, Inc.

Inventor： Swapnil P. Sakharshete , Maxim V. Kazakov , Milind N. Nemlekar , Samuel Lawrence Wasmundt

IPC: G06N20/10 , G06F9/38 , G06F9/30 , G06F17/16 , G06N3/02

Abstract: Methods, systems, and devices for pipeline fusion of a plurality of kernels. In some implementations, a first batch of a first kernel is executed on a first processing device to generate a first output of the first kernel based on an input. A first batch of a second kernel is executed on a second processing device to generate a first output of the second kernel based on the first output of the first kernel. A second batch of the first kernel is executed on the first processing device to generate a second output of the first kernel based on the input. The execution of the second batch of the first kernel overlaps at least partially in time with executing the first batch of the second kernel.

Patent Agency Ranking