Patent search ap:("Advanced Micro Devices Page Inc.") AND inv:"Samuel Lawrence Wasmundt"

1.

发明授权
Managing variations among nodes in parallel system frameworks 有权

公开(公告)号：US10355966B2

公开(公告)日：2019-07-16

申请号：US15081558

申请日：2016-03-25

Applicant: Advanced Micro Devices, Inc.

Inventor： Samuel Lawrence Wasmundt , Leonardo Piga , Indrani Paul , Wei Huang , Manish Arora

IPC: H04L12/26 , H04L29/08

Abstract: Systems, apparatuses, and methods for managing variations among nodes in parallel system frameworks. Sensor and performance data associated with the nodes of a multi-node cluster may be monitored to detect variations among the nodes. A variability metric may be calculated for each node of the cluster based on the sensor and performance data associated with the node. The variability metrics may then be used by a mapper to efficiently map tasks of a parallel application to the nodes of the cluster. In one embodiment, the mapper may assign the critical tasks of the parallel application to the nodes with the lowest variability metrics. In another embodiment, the hardware of the nodes may be reconfigured so as to reduce the node-to-node variability.

2.

发明申请
MANAGING VARIATIONS AMONG NODES IN PARALLEL SYSTEM FRAMEWORKS 审中-公开

公开(公告)号：US20170279703A1

公开(公告)日：2017-09-28

申请号：US15081558

申请日：2016-03-25

Applicant: Advanced Micro Devices, Inc.

Inventor： Samuel Lawrence Wasmundt , Leonardo Piga , Indrani Paul , Wei Huang , Manish Arora

IPC: H04L12/26 , H04L29/08

CPC classification number: H04L43/16 , H04L43/08 , H04L67/10 , H04L67/1008

Abstract: Systems, apparatuses, and methods for managing variations among nodes in parallel system frameworks. Sensor and performance data associated with the nodes of a multi-node cluster may be monitored to detect variations among the nodes. A variability metric may be calculated for each node of the cluster based on the sensor and performance data associated with the node. The variability metrics may then be used by a mapper to efficiently map tasks of a parallel application to the nodes of the cluster. In one embodiment, the mapper may assign the critical tasks of the parallel application to the nodes with the lowest variability metrics. In another embodiment, the hardware of the nodes may be reconfigured so as to reduce the node-to-node variability.

3.

发明授权
Hardware accelerated convolution 有权

公开(公告)号：US11657119B2

公开(公告)日：2023-05-23

申请号：US16557911

申请日：2019-08-30

Applicant: Advanced Micro Devices, Inc.

Inventor： Swapnil P. Sakharshete , Samuel Lawrence Wasmundt , Maxim V. Kazakov , Vineet Goel

IPC: G06F17/16 , G06N3/08

CPC classification number: G06F17/16 , G06N3/08

Abstract: A processing device is provided which includes memory configured to store data and a processor configured to determine, based on convolutional parameters associated with an image, a virtual general matrix-matrix multiplication (GEMM) space of a virtual GEMM space output matrix and generate, in the virtual GEMM space output matrix, a convolution result by matrix multiplying the data corresponding to a virtual GEMM space input matrix with the data corresponding to a virtual GEMM space filter matrix. The processing device also includes convolutional mapping hardware configured to map, based on the convolutional parameters, positions of the virtual GEMM space input matrix to positions of an image space of the image.

4.

发明申请
POWER REDUCTION FOR MACHINE LEARNING ACCELERATOR BACKGROUND 有权

公开(公告)号：US20210303987A1

公开(公告)日：2021-09-30

申请号：US16831711

申请日：2020-03-26

Applicant: Advanced Micro Devices, Inc.

Inventor： Maxim V. Kazakov , Samuel Lawrence Wasmundt

IPC: G06N3/08 , G06N3/04

Abstract: A technique for performing neural network operations is disclosed. The technique includes identifying a first matrix tile and a second matrix tile, obtaining first range information for the first matrix tile and second range information for the second matrix tile, selecting a matrix multiplication path based on the first range information and the second range information, and performing a matrix multiplication on the first matrix tile and the second matrix tile using the selected matrix multiplication path to generate a tile matrix multiplication product.

5.

发明申请
HARDWARE ACCELERATED CONVOLUTION 审中-公开

公开(公告)号：US20200184002A1

公开(公告)日：2020-06-11

申请号：US16557911

申请日：2019-08-30

Applicant: Advanced Micro Devices, Inc.

Inventor： Swapnil P. Sakharshete , Samuel Lawrence Wasmundt , Maxim V. Kazakov , Vineet Goel

IPC: G06F17/16 , G06N3/08

Abstract: A processing device is provided which includes memory configured to store data and a processor configured to determine, based on convolutional parameters associated with an image, a virtual general matrix-matrix multiplication (GEMM) space of a virtual GEMM space output matrix and generate, in the virtual GEMM space output matrix, a convolution result by matrix multiplying the data corresponding to a virtual GEMM space input matrix with the data corresponding to a virtual GEMM space filter matrix. The processing device also includes convolutional mapping hardware configured to map, based on the convolutional parameters, positions of the virtual GEMM space input matrix to positions of an image space of the image.

6.

发明申请
MACHINE LEARNING CLUSTER PIPELINE FUSION 有权

公开(公告)号：US20230004871A1

公开(公告)日：2023-01-05

申请号：US17364787

申请日：2021-06-30

Applicant: Advanced Micro Devices, Inc.

Inventor： Swapnil P. Sakharshete , Maxim V. Kazakov , Milind N. Nemlekar , Samuel Lawrence Wasmundt

IPC: G06N20/10 , G06F9/38 , G06F9/30 , G06F17/16 , G06N3/02

Abstract: Methods, systems, and devices for pipeline fusion of a plurality of kernels. In some implementations, a first batch of a first kernel is executed on a first processing device to generate a first output of the first kernel based on an input. A first batch of a second kernel is executed on a second processing device to generate a first output of the second kernel based on the first output of the first kernel. A second batch of the first kernel is executed on the first processing device to generate a second output of the first kernel based on the input. The execution of the second batch of the first kernel overlaps at least partially in time with executing the first batch of the second kernel.

7.

发明授权
Virtual space memory bandwidth reduction 有权

公开(公告)号：US11030095B2

公开(公告)日：2021-06-08

申请号：US16215298

申请日：2018-12-10

Applicant: ADVANCED MICRO DEVICES, INC.

Inventor： Swapnil Sakharshete , Samuel Lawrence Wasmundt

IPC: G06F12/06 , G06F12/109 , G06T1/20 , G06F17/16

Abstract: A processing system includes a central processing unit (CPU) and a graphics processing unit (GPU) that has a plurality of compute units. The GPU receives an image from the CPU and determines a total result area in a virtual-matrix-multiplication space of a virtual matrix-multiplication output matrix based on convolutional parameters associated with the image in an image space. The GPU partitions the total result area of the virtual matrix-multiplication output matrix into a plurality of virtual segments. The GPU allocates convolution operations to the plurality of compute units based on each virtual segment of the plurality of virtual segments.

Search Results

Country/Region

Patent validity

Application date

Publication (announcement) day

applicant

The country/region where the applicant is located

Inventor

IPC

IPC Department

IPC class

IPC subclass

IPC group

IPC team

Appearance classification