Transposing at-speed in a vector-matrix accelerator

    公开(公告)号:US12164917B1

    公开(公告)日:2024-12-10

    申请号:US18198387

    申请日:2023-05-17

    Applicant: Google LLC

    Abstract: A system including one or more processors configured to receive a transpose instruction indicating to transpose a source matrix to a result matrix, provide data elements of the source matrix to input switching circuits, reorder the data elements using the input switching circuits, provide the data elements from the input switching circuits to one or more lanes of a datapath, provide the data elements from the datapath to output switching circuits, undo the reordering of the data elements using the output switching circuits, and provide the data elements from the output switching circuits to a result matrix. Each respective lane of the datapath receiving data elements receives multiple data elements directed to different respective non-overlapping portions of the lane.

    Custom Scratchpad Memory For Partial Dot Product Reductions

    公开(公告)号:US20250013432A1

    公开(公告)日:2025-01-09

    申请号:US18218448

    申请日:2023-07-05

    Applicant: Google LLC

    Abstract: Aspects of the disclosed technology include techniques and mechanisms for using a custom scratchpad memory for partial dot product reductions. The custom scratchpad memory may be a special purpose memory that is dedicated to receiving and storing partial dot products determined by matrix multiplier units. Each partial dot product may correspond to tiles of a resultant matrix, where the resultant matrix is the product of matrix multiplication that can use a first matrix representing a user query as a left-hand side operand and a second matrix representing a trained model containing data that may be used to respond to the user query as a right-hand side operand. The custom scratchpad memory may append the tiles determined by the matrix multiplication, where the appended tiles may create the resultant matrix. Custom scratchpad memory may write the resultant matrix to general purpose memory, where it may be used to respond to the user query.

    Transposing At-Speed in a Vector-Matrix Accelerator

    公开(公告)号:US20240385837A1

    公开(公告)日:2024-11-21

    申请号:US18198387

    申请日:2023-05-17

    Applicant: Google LLC

    Abstract: A system including one or more processors configured to receive a transpose instruction indicating to transpose a source matrix to a result matrix, provide data elements of the source matrix to input switching circuits, reorder the data elements using the input switching circuits, provide the data elements from the input switching circuits to one or more lanes of a datapath, provide the data elements from the datapath to output switching circuits, undo the reordering of the data elements using the output switching circuits, and provide the data elements from the output switching circuits to a result matrix. Each respective lane of the datapath receiving data elements receives multiple data elements directed to different respective non-overlapping portions of the lane.

Patent Agency Ranking