Patent search ap:("Advanced Micro Devices Page Inc.") AND inv:"Li Peng"

1.

发明申请
MATRIX DATA BROADCAST ARCHITECTURE 有权

公开(公告)号：US20210191761A1

公开(公告)日：2021-06-24

申请号：US16729811

申请日：2019-12-30

Applicant: Advanced Micro Devices, Inc.

Inventor： Li Peng , Jian Yang , Chi Tang

IPC: G06F9/48 , G06F9/46 , G06F9/54 , G06F9/30 , G06F9/32

Abstract: Systems, apparatuses, and methods for efficient parallel execution of multiple work units in a processor by reducing a number of memory accesses are disclosed. A computing system includes a processor core with a parallel data architecture. The processor core executes a software application with matrix operations. The processor core supports the broadcast of shared data to multiple compute units of the processor core. A compiler or other code assigns thread groups to compute units based on detecting shared data among the compute units. Rather than send multiple read accesses to a memory subsystem for the shared data, the processor core generates a single access request. The single access request includes information to identify the multiple compute units for receiving the shared data when broadcasted by the processor core.

2.

发明申请
METHOD FOR MATRIX DATA BROADCAST IN PARALLEL PROCESSING 有权

公开(公告)号：US20210191758A1

公开(公告)日：2021-06-24

申请号：US16723016

申请日：2019-12-20

Applicant: Advanced Micro Devices, Inc.

Inventor： Li Peng , Jian Yang , Chi Tang

IPC: G06F9/48 , G06F9/54 , G06F12/0815 , G06F12/084 , G06F15/78

Abstract: Systems, apparatuses, and methods for efficient parallel execution of multiple work units in a processor by reducing a number of memory accesses are disclosed. A computing system includes a processor core with a parallel data architecture. One or more of a software application and firmware implement matrix operations and support the broadcast of shared data to multiple compute units of the processor core. The application creates thread groups by matching compute kernels of the application with data items, and grouping the resulting work units into thread groups. The application assigns the thread groups to compute units based on detecting shared data among the compute units. Rather than send multiple read access to a memory subsystem for the shared data, a single access request is generated. The single access request includes information to identify the multiple compute units for receiving the shared data when broadcasted.

3.

发明申请
METHOD FOR MATRIX DATA BROADCAST IN PARALLEL PROCESSING 有权

公开(公告)号：US20220129312A1

公开(公告)日：2022-04-28

申请号：US17571374

申请日：2022-01-07

Applicant: Advanced Micro Devices, Inc.

Inventor： Li Peng , Jian Yang , Chi Tang

IPC: G06F9/48 , G06F9/54 , G06F15/78 , G06F12/084 , G06F12/0815

Abstract: Systems, apparatuses, and methods for efficient parallel execution of multiple work units in a processor by reducing a number of memory accesses are disclosed. A computing system includes a processor core with a parallel data architecture. One or more of a software application and firmware implement matrix operations and support the broadcast of shared data to multiple compute units of the processor core. The application creates thread groups by matching compute kernels of the application with data items, and grouping the resulting work units into thread groups. The application assigns the thread groups to compute units based on detecting shared data among the compute units. Rather than send multiple read access to a memory subsystem for the shared data, a single access request is generated. The single access request includes information to identify the multiple compute units for receiving the shared data when broadcasted.

4.

发明授权
Matrix data broadcast architecture 有权

公开(公告)号：US11609785B2

公开(公告)日：2023-03-21

申请号：US16729811

申请日：2019-12-30

Applicant: Advanced Micro Devices, Inc.

Inventor： Li Peng , Jian Yang , Chi Tang

IPC: G06F9/30 , G06F9/38 , G06F9/48 , G06F9/46 , G06F9/32 , G06F9/54

Abstract: Systems, apparatuses, and methods for efficient parallel execution of multiple work units in a processor by reducing a number of memory accesses are disclosed. A computing system includes a processor core with a parallel data architecture. The processor core executes a software application with matrix operations. The processor core supports the broadcast of shared data to multiple compute units of the processor core. A compiler or other code assigns thread groups to compute units based on detecting shared data among the compute units. Rather than send multiple read accesses to a memory subsystem for the shared data, the processor core generates a single access request. The single access request includes information to identify the multiple compute units for receiving the shared data when broadcasted by the processor core.

5.

发明授权
Method for matrix data broadcast in parallel processing 有权

公开(公告)号：US11275612B2

公开(公告)日：2022-03-15

申请号：US16723016

申请日：2019-12-20

Applicant: Advanced Micro Devices, Inc.

Inventor： Li Peng , Jian Yang , Chi Tang

IPC: G06F9/48 , G06F9/54 , G06F15/78 , G06F12/084 , G06F12/0815

Abstract: Systems, apparatuses, and methods for efficient parallel execution of multiple work units in a processor by reducing a number of memory accesses are disclosed. A computing system includes a processor core with a parallel data architecture. One or more of a software application and firmware implement matrix operations and support the broadcast of shared data to multiple compute units of the processor core. The application creates thread groups by matching compute kernels of the application with data items, and grouping the resulting work units into thread groups. The application assigns the thread groups to compute units based on detecting shared data among the compute units. Rather than send multiple read access to a memory subsystem for the shared data, a single access request is generated. The single access request includes information to identify the multiple compute units for receiving the shared data when broadcasted.

6.

发明授权
Method for matrix data broadcast in parallel processing 有权

公开(公告)号：US11983560B2

公开(公告)日：2024-05-14

申请号：US17571374

申请日：2022-01-07

Applicant: Advanced Micro Devices, Inc.

Inventor： Li Peng , Jian Yang , Chi Tang

IPC: G06F9/48 , G06F9/54 , G06F12/0815 , G06F12/084 , G06F15/78

CPC classification number: G06F9/4881 , G06F9/542 , G06F12/0815 , G06F12/084 , G06F15/7807

Abstract: Systems, apparatuses, and methods for efficient parallel execution of multiple work units in a processor by reducing a number of memory accesses are disclosed. A computing system includes a processor core with a parallel data architecture. One or more of a software application and firmware implement matrix operations and support the broadcast of shared data to multiple compute units of the processor core. The application creates thread groups by matching compute kernels of the application with data items, and grouping the resulting work units into thread groups. The application assigns the thread groups to compute units based on detecting shared data among the compute units. Rather than send multiple read access to a memory subsystem for the shared data, a single access request is generated. The single access request includes information to identify the multiple compute units for receiving the shared data when broadcasted.

Search Results

Country/Region

Patent validity

Application date

Publication (announcement) day

applicant

The country/region where the applicant is located

Inventor

IPC

IPC Department

IPC class

IPC subclass

IPC group

IPC team

Appearance classification