-
公开(公告)号:US20240320293A1
公开(公告)日:2024-09-26
申请号:US18125454
申请日:2023-03-23
Applicant: ADVANCED MICRO DEVICES, INC.
Inventor: Nallani Bhaskar , Mithun Mohan Kadavil Madana Mohanan
CPC classification number: G06F17/16 , G06F9/4881
Abstract: Techniques are described in which an estimated optimal thread quantity for matrix multiplication is determined and implemented based on dimensions of the input matrices being multiplied and one or more kernel parameters that vary based on processor architecture. An efficient factorization of the estimated optimal thread quantity is based on a number of blocks along a first dimension of a first input matrix, and a number of blocks along a dimension n of a second input matrix B, with both numbers being based on the kernel parameters. In certain embodiments, a command processor of a parallel processor determines an estimated optimal thread quantity for performing a matrix multiplication command responsive to receiving the matrix multiplication command, and then schedules that estimated optimal thread quantity of kernel threads to execute the matrix multiplication command in parallel.