-
公开(公告)号:US20250021622A1
公开(公告)日:2025-01-16
申请号:US18769710
申请日:2024-07-11
Applicant: NVIDIA Corporation
Inventor: Sean Jeffrey Treichler , Yury Uralsky , Karthik Vaidyanathan , Franz Petrik Clarberg , Jeffrey Alan Bolz , John Matthew Burgess , Ajay Sudarshan Tirumala
IPC: G06F17/16
Abstract: Disclosed are systems and techniques for efficient vector-matrix multiply operations across parallel processing unit threads. The techniques include receiving first data of a first thread, the first data comprising a first input vector and a first matrix. The techniques further include receiving second data of a second thread, the second data comprising a second input vector and a second matrix. The techniques further include combining the first input vector and the second input vector into an input matrix and generating a result matrix at least by multiplying the input matrix by the first matrix using a matrix-multiply circuit. The techniques further include separating the result matrix into a first result value and a second result value, the first result value corresponding to the first thread and the second result value corresponding to the second thread.