-
公开(公告)号:US12164917B1
公开(公告)日:2024-12-10
申请号:US18198387
申请日:2023-05-17
Applicant: Google LLC
Inventor: Vinayak Anand Gokhale , Matthew Leever Hedlund , Matthew William Ashcraft , Indranil Chakraborty
Abstract: A system including one or more processors configured to receive a transpose instruction indicating to transpose a source matrix to a result matrix, provide data elements of the source matrix to input switching circuits, reorder the data elements using the input switching circuits, provide the data elements from the input switching circuits to one or more lanes of a datapath, provide the data elements from the datapath to output switching circuits, undo the reordering of the data elements using the output switching circuits, and provide the data elements from the output switching circuits to a result matrix. Each respective lane of the datapath receiving data elements receives multiple data elements directed to different respective non-overlapping portions of the lane.
-
公开(公告)号:US20240272904A1
公开(公告)日:2024-08-15
申请号:US18109583
申请日:2023-02-14
Applicant: Google LLC
Inventor: Matthew Leever Hedlund , Christopher Aaron Clark , Andrew Everett Phelps , Thomas James Norrie , Sushma Honnavara-Prasad , Vinayak Anand Gokhale , Pareesa Ameneh Golnari
CPC classification number: G06F9/30036 , G06F9/30032 , G06F17/16
Abstract: In a system including vector registers storing right-hand side data and left-hand side data, first and second matrix staging registers, and a systolic array of processing cells for conducting matrix multiplication operations using the right-hand side data and left-hand side data, one or more processors load the right-hand side data from the vector registers to the first matrix staging register based on an instruction indicating whether to transpose the right-hand side data, load the left-hand side data from the vector registers into the second matrix staging register based on another instruction indicating whether to transpose the left-hand side data, load the right-hand side data from the first matrix staging register into the systolic array, and, in a cycle of the matrix multiplication operation, pass one or more columns of the left-hand side data from the second matrix staging register to a column of the systolic array.
-
公开(公告)号:US20250013432A1
公开(公告)日:2025-01-09
申请号:US18218448
申请日:2023-07-05
Applicant: Google LLC
Inventor: Vinayak Anand Gokhale , Matthew Leever Hedlund , Rahul Nagarajan , Naveen Muralimanohar , Shriram Nagarajan
Abstract: Aspects of the disclosed technology include techniques and mechanisms for using a custom scratchpad memory for partial dot product reductions. The custom scratchpad memory may be a special purpose memory that is dedicated to receiving and storing partial dot products determined by matrix multiplier units. Each partial dot product may correspond to tiles of a resultant matrix, where the resultant matrix is the product of matrix multiplication that can use a first matrix representing a user query as a left-hand side operand and a second matrix representing a trained model containing data that may be used to respond to the user query as a right-hand side operand. The custom scratchpad memory may append the tiles determined by the matrix multiplication, where the appended tiles may create the resultant matrix. Custom scratchpad memory may write the resultant matrix to general purpose memory, where it may be used to respond to the user query.
-
公开(公告)号:US20240385837A1
公开(公告)日:2024-11-21
申请号:US18198387
申请日:2023-05-17
Applicant: Google LLC
Inventor: Vinayak Anand Gokhale , Matthew Leever Hedlund , Matthew William Ashcraft , Indranil Chakraborty
Abstract: A system including one or more processors configured to receive a transpose instruction indicating to transpose a source matrix to a result matrix, provide data elements of the source matrix to input switching circuits, reorder the data elements using the input switching circuits, provide the data elements from the input switching circuits to one or more lanes of a datapath, provide the data elements from the datapath to output switching circuits, undo the reordering of the data elements using the output switching circuits, and provide the data elements from the output switching circuits to a result matrix. Each respective lane of the datapath receiving data elements receives multiple data elements directed to different respective non-overlapping portions of the lane.
-
公开(公告)号:US12073216B1
公开(公告)日:2024-08-27
申请号:US18109583
申请日:2023-02-14
Applicant: Google LLC
Inventor: Matthew Leever Hedlund , Christopher Aaron Clark , Andrew Everett Phelps , Thomas James Norrie , Sushma Honnavara-Prasad , Vinayak Anand Gokhale , Pareesa Ameneh Golnari
CPC classification number: G06F9/30036 , G06F9/30032 , G06F17/16
Abstract: In a system including vector registers storing right-hand side data and left-hand side data, first and second matrix staging registers, and a systolic array of processing cells for conducting matrix multiplication operations using the right-hand side data and left-hand side data, one or more processors load the right-hand side data from the vector registers to the first matrix staging register based on an instruction indicating whether to transpose the right-hand side data, load the left-hand side data from the vector registers into the second matrix staging register based on another instruction indicating whether to transpose the left-hand side data, load the right-hand side data from the first matrix staging register into the systolic array, and, in a cycle of the matrix multiplication operation, pass one or more columns of the left-hand side data from the second matrix staging register to a column of the systolic array.
-
公开(公告)号:US20240220202A1
公开(公告)日:2024-07-04
申请号:US18168972
申请日:2023-02-14
Applicant: Google LLC
Inventor: Matthew Leever Hedlund , Christopher Aaron Clark , Andrew Everett Phelps , Thomas James Norrie , Norman Paul Jouppi , Sushma Honnavara-Prasad , Vinayak Anand Gokhale , Pareesa Ameneh Golnari
CPC classification number: G06F7/5443 , G06F7/485 , G06F7/4876 , G06F15/8046
Abstract: A system and method for matrix multiplication using a systolic array configurable between multiple modes of operation. A systolic processor may receive a data type indicator for the matrix multiplication. For a first data type, the systolic processor may load the right-hand side data from the right-hand matrix register into the data processing cells of the systolic array between row 0 and row M−1, and pass the respective row of the left-hand side data through a corresponding row of the systolic array between rows 0 and M−1. For a second data type, the systolic processor may split each element of the left-hand side data and the right-hand side data into respective first and second element halves, and move each element half through a corresponding row of the systolic array between rows 0 and 2M−1.
-
-
-
-
-