-
公开(公告)号:US20230359584A1
公开(公告)日:2023-11-09
申请号:US18351916
申请日:2023-07-13
申请人: Groq, Inc.
CPC分类号: G06F15/825 , G06N20/00
摘要: Embodiments are directed to a processor having a functional slice architecture. The processor is divided into tiles (or functional units) organized into a plurality of functional slices. The functional slices are configured to perform specific operations within the processor, which includes memory slices for storing operand data and arithmetic logic slices for performing operations on received operand data (e.g., vector processing, matrix manipulation). The processor includes a plurality of functional slices of a module type, each functional slice having a plurality of tiles. The processor further includes a plurality of data transport lanes for transporting data in a direction indicated in a corresponding instruction. The processor also includes a plurality of instruction queues, each instruction queue associated with a corresponding functional slice of the plurality of functional slices, wherein the instructions in the instruction queues comprise a functional slice specific operation code.
-
公开(公告)号:US11809514B2
公开(公告)日:2023-11-07
申请号:US17519425
申请日:2021-11-04
申请人: Groq, Inc.
IPC分类号: G06F17/15 , G06F17/16 , G06N20/10 , G06N3/08 , G06F7/76 , G06N7/00 , G06F7/544 , G06F9/54 , G06N3/04 , G06F18/2137
CPC分类号: G06F17/153 , G06F7/5443 , G06F7/76 , G06F9/544 , G06F17/16 , G06F18/2137 , G06N3/04 , G06N3/08 , G06N7/00 , G06N20/10
摘要: A method comprises receiving a kernel used to convolve with an input tensor. For a first dimension of the kernel, a square block of values for each single dimensional vector of the kernel that includes all rotations of that single dimensional vector is generated. For each additional dimension of the kernel, group blocks of an immediately preceding dimension into sets of blocks, each set of blocks including blocks of the immediately preceding dimension that are aligned along a vector that is parallel to the axis of the dimension; and generate, for the additional dimension, one or more blocks of values, each block including all rotations of blocks within each of the sets of blocks of the immediately preceding dimension. The block of values corresponding to the last dimension in the additional dimensions of the kernel is output as the expanded kernel.
-
公开(公告)号:US20220365582A1
公开(公告)日:2022-11-17
申请号:US17732408
申请日:2022-04-28
申请人: Groq, Inc.
发明人: JEFFREY WERNER
IPC分类号: G06F1/3206 , G06N20/00 , G06F9/38
摘要: Embodiments are directed to a power grid distribution for a deterministic processor. The deterministic processor includes a plurality of functional slices, a plurality of data transport lanes for transporting data across the functional slices along a first spatial dimension, and a plurality of instruction control units (ICUs). An instruction in each subset of the ICUs includes a functional slice specific operation code and is transported to a corresponding functional slice along a second spatial dimension orthogonal to the first spatial dimension. A power supply grid of metal traces is spread across the first and second spatial dimensions for supplying power to the functional slices and the ICUs. At least a portion of the metal traces are routed as discontinuous stubs along the first spatial dimension or the second spatial dimension.
-
公开(公告)号:US11360934B1
公开(公告)日:2022-06-14
申请号:US17105976
申请日:2020-11-27
申请人: Groq, Inc.
摘要: Embodiments are directed to a processor having a functional slice architecture. The processor is divided into tiles (or functional units) organized into a plurality of functional slices. The functional slices are configured to perform specific operations within the processor, which includes memory slices for storing operand data and arithmetic logic slices for performing operations on received operand data (e.g., vector processing, matrix manipulation). The processor includes a plurality of functional slices of a module type, each functional slice having a plurality of tiles. The processor further includes a plurality of data transport lanes for transporting data in a direction indicated in a corresponding instruction. The processor also includes a plurality of instruction queues, each instruction queue associated with a corresponding functional slice of the plurality of functional slices, wherein the instructions in the instruction queues comprise a functional slice specific operation code.
-
公开(公告)号:US20220075598A1
公开(公告)日:2022-03-10
申请号:US17351044
申请日:2021-06-17
申请人: Groq, Inc.
摘要: In one embodiment, multiplier circuitry multiplies operands of a first format. One or more storage register circuits store digital bits corresponding to an operand and another operand of the first format. A decomposing circuit decomposes the operand into a first plurality of operands, and the other operand into a second plurality of operands. Each multiplier circuit multiplies a respective first operand of the first plurality of operands with a respective second operand of the second plurality of operands to generate a corresponding partial result of a plurality of partial results. An accumulator circuit accumulates the plurality of partial results using a second format to generate a complete result of the second format that is stored in the accumulator circuit. A conversion circuit truncates the complete result of the second format and converts the truncated result into an output result of an output format.
-
公开(公告)号:US11216734B1
公开(公告)日:2022-01-04
申请号:US16526922
申请日:2019-07-30
申请人: Groq, Inc.
摘要: A system receives a predictive model and receives one or more runtime constraints. The system generates a directed acyclic graph (DAG) of the predictive model indicating dependencies. The system compiles the predictive model into first instructions for a first processor based on the one or more runtime constraints and the DAG. The system packages first instructions, the one or more runtime constraints, and the DAG of the predictive model in a first binary. The system recompiles the predictive model into second instructions for a second processor based on the runtime constraints and the DAG stored in the first processor. The system packages the second instructions, the DAG, and the runtime constraints in a second binary.
-
公开(公告)号:US11210594B1
公开(公告)日:2021-12-28
申请号:US16526916
申请日:2019-07-30
申请人: Groq, Inc.
摘要: A system receives a predictive model and receives one or more runtime constraints. The system generates a directed acyclic graph (DAG) of the predictive model indicating dependencies. The system compiles the predictive model into first instructions for a first processor based on the one or more runtime constraints and the DAG. The system packages first instructions, the one or more runtime constraints, and the DAG of the predictive model in a first binary. The system recompiles the predictive model into second instructions for a second processor based on the runtime constraints and the DAG stored in the first processor. The system packages the second instructions, the DAG, and the runtime constraints in a second binary.
-
公开(公告)号:US11165428B1
公开(公告)日:2021-11-02
申请号:US16932632
申请日:2020-07-17
申请人: Groq, Inc.
发明人: Jonathan Ross , Dinesh Maheshwari
IPC分类号: H03K19/173 , H03K19/17728 , H03K19/21 , G06F9/38 , H03K19/017
摘要: The present disclosure provides circuits and methods that can be used to update configurations. An example circuit can include a plurality hLUTs and a plurality of registers configured to propagate a set of data or a portion thereof to the plurality of hLUTs. An hLUT of the plurality of hLUTs can have a transformation unit comprising transformation circuitry configured to (i) receive the set of data or the portion thereof from a register of the plurality of registers and (ii) transform the set of data or the portion thereof into configurations for the hLUT.
-
公开(公告)号:US11042360B1
公开(公告)日:2021-06-22
申请号:US16986007
申请日:2020-08-05
申请人: Groq, Inc.
摘要: In one embodiment, in a first mode, first and second input operands having a first data type are multiplied using one or more of a plurality of multipliers, and in second mode, a plurality of input operands having a second data type are multiplied using the plurality of multipliers. Accordingly, multiplier circuitry may process different input data types and share circuitry across the different modes. In some embodiments, in the first mode, products may be converted to a third data type, and in the second mode, multiple products may be concatenated. Values in the third data type, in the first mode, and concatenated values having the second data type, in the second mode, may be added across different multimodal multipliers to form a multiply-accumulator. In some embodiments, the plurality of multiply-accumulators may be configured in series.
-
60.
公开(公告)号:US20210157767A1
公开(公告)日:2021-05-27
申请号:US17104465
申请日:2020-11-25
申请人: Groq, Inc.
摘要: A computational array is implemented in which all operands and results are loaded or output from a single side of the array. The computational array comprises a plurality of cells arranged in n rows and m columns, each configured to produce a processed value based upon a weight value and an activation value. The cells receive weight and activation values are received via colinear weight and activation transmission channels that each extend across a first side edge of the computational array to provide weight values and activations values to the cells of the array. In addition, result values produced at a top cell of each of the m columns of the array are routed through the array to be output from the same first side edge of the array at a same relative timing at which the result values were produced.
-
-
-
-
-
-
-
-
-