Efficient Matrix Multiply and Add with a Group of Warps

    公开(公告)号:US20230289398A1

    公开(公告)日:2023-09-14

    申请号:US17691406

    申请日:2022-03-10

    CPC classification number: G06F17/16 G06F9/3001 G06F7/5443

    Abstract: This specification describes techniques for implementing matrix multiply and add (MMA) operations in graphics processing units (GPU)s and other processors. The implementations provide for a plurality of warps of threads to collaborate in generating the result matrix by enabling each thread to share its respective register files to be accessed by the datapaths associated with other threads in the group of warps. A state machine circuit controls a MMA execution among the warps executing on asynchronous computation units. A group MMA (GMMA) instruction provides for a descriptor to be provided as parameter where the descriptor may include information regarding size and formats of input data to be loaded into shared memory and/or the datapath.

Patent Agency Ranking