-
公开(公告)号:US20220138002A1
公开(公告)日:2022-05-05
申请号:US17499708
申请日:2021-10-12
Applicant: ADVANCED MICRO DEVICES, INC.
Inventor: Milind N. NEMLEKAR
Abstract: A graphics processing unit (GPU) schedules recurrent matrix multiplication operations at different subsets of CUs of the GPU. The GPU includes a scheduler that receives sets of recurrent matrix multiplication operations, such as multiplication operations associated with a recurrent neural network (RNN). The multiple operations associated with, for example, an RNN layer are fused into a single kernel, which is scheduled by the scheduler such that one work group is assigned per compute unit, thus assigning different ones of the recurrent matrix multiplication operations to different subsets of the CUs of the GPU. In addition, via software synchronization of the different workgroups, the GPU pipelines the assigned matrix multiplication operations so that each subset of CUs provides corresponding multiplication results to a different subset, and so that each subset of CUs executes at least a portion of the multiplication operations concurrently.
-
公开(公告)号:US20230195664A1
公开(公告)日:2023-06-22
申请号:US17558798
申请日:2021-12-22
Applicant: ADVANCED MICRO DEVICES, INC. , ATI TECHNOLOGIES ULC
Inventor: Sean KEELY , Joseph L. GREATHOUSE , Hari THANGIRALA , Alan D. SMITH , Milind N. NEMLEKAR
CPC classification number: G06F13/28 , G06F13/1668
Abstract: A method for software management of DMA transfer commands includes receiving a DMA transfer command instructing a data transfer by a first processor device. Based at least in part on a determination of runtime system resource availability, a device different from the first processor device is assigned to assist in transfer of at least a first portion of the data transfer. In some embodiments, the DMA transfer command instructs the first processor device to write a copy of data to a third processor device. Software analyzes network bus congestion at a shared communications bus and initiates DMA transfer via a multi-hop communications path to bypass the congested network bus.
-
公开(公告)号:US20200183734A1
公开(公告)日:2020-06-11
申请号:US16211954
申请日:2018-12-06
Applicant: ADVANCED MICRO DEVICES, INC.
Inventor: Milind N. NEMLEKAR
Abstract: A graphics processing unit (GPU) schedules recurrent matrix multiplication operations at different subsets of CUs of the GPU. The GPU includes a scheduler that receives sets of recurrent matrix multiplication operations, such as multiplication operations associated with a recurrent neural network (RNN). The multiple operations associated with, for example, an RNN layer are fused into a single kernel, which is scheduled by the scheduler such that one work group is assigned per compute unit, thus assigning different ones of the recurrent matrix multiplication operations to different subsets of the CUs of the GPU. In addition, via software synchronization of the different workgroups, the GPU pipelines the assigned matrix multiplication operations so that each subset of CUs provides corresponding multiplication results to a different subset, and so that each subset of CUs executes at least a portion of the multiplication operations concurrently.
-
-