Patent search ap:("ADVANCED MICRO DEVICES Page INC.") AND inv:"Milind N. NEMLEKAR"

1.

发明申请
PIPELINED MATRIX MULTIPLICATION AT A GRAPHICS PROCESSING UNIT 有权

公开(公告)号：US20220138002A1

公开(公告)日：2022-05-05

申请号：US17499708

申请日：2021-10-12

Applicant: ADVANCED MICRO DEVICES, INC.

Inventor： Milind N. NEMLEKAR

IPC: G06F9/48 , G06T1/20 , G06N3/04 , G06F17/16

Abstract: A graphics processing unit (GPU) schedules recurrent matrix multiplication operations at different subsets of CUs of the GPU. The GPU includes a scheduler that receives sets of recurrent matrix multiplication operations, such as multiplication operations associated with a recurrent neural network (RNN). The multiple operations associated with, for example, an RNN layer are fused into a single kernel, which is scheduled by the scheduler such that one work group is assigned per compute unit, thus assigning different ones of the recurrent matrix multiplication operations to different subsets of the CUs of the GPU. In addition, via software synchronization of the different workgroups, the GPU pipelines the assigned matrix multiplication operations so that each subset of CUs provides corresponding multiplication results to a different subset, and so that each subset of CUs executes at least a portion of the multiplication operations concurrently.

2.

发明公开
SOFTWARE MANAGEMENT OF DIRECT MEMORY ACCESS COMMANDS 审中-公开

公开(公告)号：US20230195664A1

公开(公告)日：2023-06-22

申请号：US17558798

申请日：2021-12-22

Applicant: ADVANCED MICRO DEVICES, INC. , ATI TECHNOLOGIES ULC

Inventor： Sean KEELY , Joseph L. GREATHOUSE , Hari THANGIRALA , Alan D. SMITH , Milind N. NEMLEKAR

IPC: G06F13/28 , G06F13/16

CPC classification number: G06F13/28 , G06F13/1668

Abstract: A method for software management of DMA transfer commands includes receiving a DMA transfer command instructing a data transfer by a first processor device. Based at least in part on a determination of runtime system resource availability, a device different from the first processor device is assigned to assist in transfer of at least a first portion of the data transfer. In some embodiments, the DMA transfer command instructs the first processor device to write a copy of data to a third processor device. Software analyzes network bus congestion at a shared communications bus and initiates DMA transfer via a multi-hop communications path to bypass the congested network bus.

3.

发明申请
PIPELINED MATRIX MULTIPLICATION AT A GRAPHICS PROCESSING UNIT 审中-公开

公开(公告)号：US20200183734A1

公开(公告)日：2020-06-11

申请号：US16211954

申请日：2018-12-06

Applicant: ADVANCED MICRO DEVICES, INC.

Inventor： Milind N. NEMLEKAR

IPC: G06F9/48 , G06F17/16 , G06N3/04 , G06T1/20

Abstract: A graphics processing unit (GPU) schedules recurrent matrix multiplication operations at different subsets of CUs of the GPU. The GPU includes a scheduler that receives sets of recurrent matrix multiplication operations, such as multiplication operations associated with a recurrent neural network (RNN). The multiple operations associated with, for example, an RNN layer are fused into a single kernel, which is scheduled by the scheduler such that one work group is assigned per compute unit, thus assigning different ones of the recurrent matrix multiplication operations to different subsets of the CUs of the GPU. In addition, via software synchronization of the different workgroups, the GPU pipelines the assigned matrix multiplication operations so that each subset of CUs provides corresponding multiplication results to a different subset, and so that each subset of CUs executes at least a portion of the multiplication operations concurrently.

Patent Agency Ranking