-
公开(公告)号:US20210191761A1
公开(公告)日:2021-06-24
申请号:US16729811
申请日:2019-12-30
Applicant: Advanced Micro Devices, Inc.
Abstract: Systems, apparatuses, and methods for efficient parallel execution of multiple work units in a processor by reducing a number of memory accesses are disclosed. A computing system includes a processor core with a parallel data architecture. The processor core executes a software application with matrix operations. The processor core supports the broadcast of shared data to multiple compute units of the processor core. A compiler or other code assigns thread groups to compute units based on detecting shared data among the compute units. Rather than send multiple read accesses to a memory subsystem for the shared data, the processor core generates a single access request. The single access request includes information to identify the multiple compute units for receiving the shared data when broadcasted by the processor core.
-
公开(公告)号:US20210191758A1
公开(公告)日:2021-06-24
申请号:US16723016
申请日:2019-12-20
Applicant: Advanced Micro Devices, Inc.
IPC: G06F9/48 , G06F9/54 , G06F12/0815 , G06F12/084 , G06F15/78
Abstract: Systems, apparatuses, and methods for efficient parallel execution of multiple work units in a processor by reducing a number of memory accesses are disclosed. A computing system includes a processor core with a parallel data architecture. One or more of a software application and firmware implement matrix operations and support the broadcast of shared data to multiple compute units of the processor core. The application creates thread groups by matching compute kernels of the application with data items, and grouping the resulting work units into thread groups. The application assigns the thread groups to compute units based on detecting shared data among the compute units. Rather than send multiple read access to a memory subsystem for the shared data, a single access request is generated. The single access request includes information to identify the multiple compute units for receiving the shared data when broadcasted.
-
公开(公告)号:US20220129312A1
公开(公告)日:2022-04-28
申请号:US17571374
申请日:2022-01-07
Applicant: Advanced Micro Devices, Inc.
IPC: G06F9/48 , G06F9/54 , G06F15/78 , G06F12/084 , G06F12/0815
Abstract: Systems, apparatuses, and methods for efficient parallel execution of multiple work units in a processor by reducing a number of memory accesses are disclosed. A computing system includes a processor core with a parallel data architecture. One or more of a software application and firmware implement matrix operations and support the broadcast of shared data to multiple compute units of the processor core. The application creates thread groups by matching compute kernels of the application with data items, and grouping the resulting work units into thread groups. The application assigns the thread groups to compute units based on detecting shared data among the compute units. Rather than send multiple read access to a memory subsystem for the shared data, a single access request is generated. The single access request includes information to identify the multiple compute units for receiving the shared data when broadcasted.
-
公开(公告)号:US11609785B2
公开(公告)日:2023-03-21
申请号:US16729811
申请日:2019-12-30
Applicant: Advanced Micro Devices, Inc.
Abstract: Systems, apparatuses, and methods for efficient parallel execution of multiple work units in a processor by reducing a number of memory accesses are disclosed. A computing system includes a processor core with a parallel data architecture. The processor core executes a software application with matrix operations. The processor core supports the broadcast of shared data to multiple compute units of the processor core. A compiler or other code assigns thread groups to compute units based on detecting shared data among the compute units. Rather than send multiple read accesses to a memory subsystem for the shared data, the processor core generates a single access request. The single access request includes information to identify the multiple compute units for receiving the shared data when broadcasted by the processor core.
-
公开(公告)号:US11275612B2
公开(公告)日:2022-03-15
申请号:US16723016
申请日:2019-12-20
Applicant: Advanced Micro Devices, Inc.
IPC: G06F9/48 , G06F9/54 , G06F15/78 , G06F12/084 , G06F12/0815
Abstract: Systems, apparatuses, and methods for efficient parallel execution of multiple work units in a processor by reducing a number of memory accesses are disclosed. A computing system includes a processor core with a parallel data architecture. One or more of a software application and firmware implement matrix operations and support the broadcast of shared data to multiple compute units of the processor core. The application creates thread groups by matching compute kernels of the application with data items, and grouping the resulting work units into thread groups. The application assigns the thread groups to compute units based on detecting shared data among the compute units. Rather than send multiple read access to a memory subsystem for the shared data, a single access request is generated. The single access request includes information to identify the multiple compute units for receiving the shared data when broadcasted.
-
公开(公告)号:US11983560B2
公开(公告)日:2024-05-14
申请号:US17571374
申请日:2022-01-07
Applicant: Advanced Micro Devices, Inc.
IPC: G06F9/48 , G06F9/54 , G06F12/0815 , G06F12/084 , G06F15/78
CPC classification number: G06F9/4881 , G06F9/542 , G06F12/0815 , G06F12/084 , G06F15/7807
Abstract: Systems, apparatuses, and methods for efficient parallel execution of multiple work units in a processor by reducing a number of memory accesses are disclosed. A computing system includes a processor core with a parallel data architecture. One or more of a software application and firmware implement matrix operations and support the broadcast of shared data to multiple compute units of the processor core. The application creates thread groups by matching compute kernels of the application with data items, and grouping the resulting work units into thread groups. The application assigns the thread groups to compute units based on detecting shared data among the compute units. Rather than send multiple read access to a memory subsystem for the shared data, a single access request is generated. The single access request includes information to identify the multiple compute units for receiving the shared data when broadcasted.
-
-
-
-
-