专利检索 aee:"Groq, Inc." 第 1 页

1.

发明公开
PROCESSOR ARCHITECTURE 审中-公开

公开(公告)号：US20240176737A1

公开(公告)日：2024-05-30

申请号：US18394442

申请日：2023-12-22

申请人： Groq, Inc.

发明人： Jonathan Alexander Ross , Dennis Charles Abts , John Thompson , Gregory M. Thorson

IPC分类号： G06F12/02 , G06F3/06 , G06F9/30 , G06F9/38 , G06F13/16

CPC分类号： G06F12/0292 , G06F3/061 , G06F3/064 , G06F3/0673 , G06F9/3004 , G06F9/3009 , G06F9/30145 , G06F9/3814 , G06F13/1689 , G06F2212/16

摘要： A processor having a functional slice architecture is divided into a plurality of functional units (“tiles”) organized into a plurality of slices. Each slice is configured to perform specific functions within the processor, which may include memory slices (MEM) for storing operand data, and arithmetic logic slices for performing operations on received operand data. The tiles of the processor are configured to stream operand data across a first dimension, and receive instructions across a second dimension orthogonal to the first dimension. The timing of data and instruction flows are configured such that corresponding data and instructions are received at each tile with a predetermined temporal relationship, allowing operand data to be transmitted between the slices of the processor without any accompanying metadata. Instead, each slice is able to determine what operations to perform on received data based upon the timing at which the data is received.

2.

发明公开
METHODOLOGY TO GENERATE EFFICIENT MODELS AND ARCHITECTURES FOR DEEP LEARNING 审中-公开

公开(公告)号：US20240020537A1

公开(公告)日：2024-01-18

申请号：US18352784

申请日：2023-07-14

申请人： Groq, Inc.

发明人： Andrew Chaang Ling , Aidan Robert Byron Wood , Baorui Zhou , Andrew Esper Bitar , Jonathan Alexander Ross

IPC分类号： G06N3/08 , G06N3/04

CPC分类号： G06N3/08 , G06N3/04

摘要： A system and method of generating an efficient neural network model architecture and an efficient processor for deep learning in an artificial intelligence (AI) processor are provided. The system and method to create the processor architecture as a companion to the neural network model by composing a plurality of processor architectures to enable architectural exploration. The compilation can be implemented for any arbitrary spatial processor architecture using either ASIC or FPGA devices. The processor architecture can be uniquely defined for a selected ML or AI model without having to update the software compiler.

3.

发明授权
Processor instruction dispatch configuration 有权

公开(公告)号：US11868804B1

公开(公告)日：2024-01-09

申请号：US16951938

申请日：2020-11-18

申请人： Groq, Inc.

发明人： Brian Lee Kurtz , Dinesh Maheshwari , James David Sprach

IPC分类号： G06F9/30 , G06F9/48 , G06F5/01 , G06F9/38

CPC分类号： G06F9/4881 , G06F5/01 , G06F9/3802 , G06F9/3856 , G06F9/3885

摘要： A processor comprises a computational array of computational elements and an instruction dispatch circuit. The computational elements receive data operands via data lanes extending along a first dimension, and processes the operands based upon instructions received from the instruction dispatch circuit via instruction lanes extending along a second dimension. The instruction dispatch circuit receives raw instructions, and comprises an instruction dispatch unit (IDU) processor that processes a set of raw instructions to generate processed instructions for dispatch to the computational elements, where the number of processed instructions is not equal to the number of instructions of the set of raw instructions. The processed instructions are dispatched to columns of the computational array via a plurality of instruction queues, wherein an instruction vector of instructions is shifted between adjacent instruction queues in a first direction, and dispatches instructions to the computational elements in a second direction.

4.

发明授权
Memory design for a processor 有权

公开(公告)号：US11868250B1

公开(公告)日：2024-01-09

申请号：US17582895

申请日：2022-01-24

申请人： Groq, Inc.

发明人： Jonathan Alexander Ross , Dennis Charles Abts , John Thompson , Gregory M. Thorson

IPC分类号： G06F12/02 , G06F3/06 , G06F9/30 , G06F13/16 , G06F9/38

CPC分类号： G06F12/0292 , G06F3/061 , G06F3/064 , G06F3/0673 , G06F9/3004 , G06F9/3009 , G06F9/30145 , G06F9/3814 , G06F13/1689 , G06F2212/16

摘要： A processor having a functional slice architecture is divided into a plurality of functional units (“tiles”) organized into a plurality of slices. Each slice is configured to perform specific functions within the processor, which may include memory slices (MEM) for storing operand data, and arithmetic logic slices for performing operations on received operand data. The tiles of the processor are configured to stream operand data across a first dimension, and receive instructions across a second dimension orthogonal to the first dimension. The timing of data and instruction flows are configured such that corresponding data and instructions are received at each tile with a predetermined temporal relationship, allowing operand data to be transmitted between the slices of the processor without any accompanying metadata. Instead, each slice is able to determine what operations to perform on received data based upon the timing at which the data is received.

5.

发明授权
Spatial locality transform of matrices 有权

公开(公告)号：US11537687B2

公开(公告)日：2022-12-27

申请号：US16686870

申请日：2019-11-18

申请人： Groq, Inc.

发明人： Jonathan Alexander Ross , Tom Hawkins , Gregory Michael Thorson , Matt Boyd

IPC分类号： G06F17/16 , G06N20/10 , G06N3/08 , G06F7/76 , G06N7/00 , G06F7/544 , G06F9/54 , G06K9/62 , G06F17/15 , G06N3/04

摘要： A method comprises accessing a flattened input stream that includes a set of parallel vectors representing a set of input values of a kernel-sized tile of an input tensor that is to be convolved with a kernel. An expanded kernel is received that is generated by permuting values from the kernel. A control pattern is received that includes a set of vectors each corresponding to the output value position for the kernel-sized tile of the output and indicating a vector of the flattened input stream to access input values. The method further comprises generating, for each output position of each kernel-sized tile of the output, a dot product between a first vector that includes values of the flattened input stream as selected by the control pattern, and a second vector corresponding to a vector in the expanded kernel corresponding to the output position.

6.

发明授权
Multichip timing synchronization circuits and methods 有权

公开(公告)号：US11474557B2

公开(公告)日：2022-10-18

申请号：US17021746

申请日：2020-09-15

申请人： Groq, Inc.

发明人： Gregory Michael Thorson , Srivathsa Dhruvanarayan

IPC分类号： G06F1/12 , G06N5/02

摘要： In one embodiment, the present disclosure includes multichip timing synchronization circuits and methods. In one embodiment, hardware counters in different systems are synchronized. Programs on the systems may include synchronization instructions. A second system executes synchronization instruction, and in response thereto, synchronizes a local software counter to a local hardware counter. The software counter on the second system may be delayed a fixed period of time corresponding to a program delay on the first system. The software counter on the second system may further be delayed by an offset to bring software counters on the two systems into sync.

7.

发明申请
Tiled Switch Matrix Data Permutation Circuit 有权

公开(公告)号：US20220236954A1

公开(公告)日：2022-07-28

申请号：US17717629

申请日：2022-04-11

申请人： Groq, Inc.

发明人： Gregory Michael Thorson

IPC分类号： G06F7/76 , G06F7/78

摘要： Embodiments of the present disclosure pertain to switch matrix circuit including a data permutation circuit. In one embodiment, the switch matrix comprises a plurality of adjacent switching blocks configured along a first axis, wherein the plurality of adjacent switching blocks each receive data and switch control settings along a second axis. The switch matrix includes a permutation circuit comprising, in each switching block, a plurality of switching stages spanning a plurality of adjacent switching blocks and at least one switching stage that does not span to adjacent switching blocks. The permutation circuit receives data in a first pattern and outputs the data in a second pattern. The data permutation performed by the switching stages is based on the particular switch control settings received in the adjacent switching blocks along the second axis.

8.

发明申请
DATA STRUCTURES WITH MULTIPLE READ PORTS 有权

公开(公告)号：US20220101896A1

公开(公告)日：2022-03-31

申请号：US17397158

申请日：2021-08-09

申请人： Groq, Inc.

发明人： Jonathan Alexander Ross , Gregory M. Thorson

IPC分类号： G11C7/10 , G06F9/30 , G06F9/38 , G11C7/22

摘要： A memory structure having 2m read ports allowing for concurrent access to n data entries can be constructed using three memory structures each having 2′ read ports. The three memory structures include two structures providing access to half of the n data entries, and a difference structure providing access to difference data between the halves of the n data entries. Each pair of the 2m ports is connected to a respective port of each of the 2m-1-port data structures, such that each port of the part can access data entries of a first half of the n data entries either by accessing the structure storing that half directly, or by accessing both the difference structure and the structure containing the second half to reconstruct the data entries of the first half, thus allowing for a pair of ports to concurrently access any of the stored data entries in parallel.

9.

发明授权
Processor architecture 有权

公开(公告)号：US11243880B1

公开(公告)日：2022-02-08

申请号：US16132243

申请日：2018-09-14

申请人： Groq, Inc.

发明人： Jonathan Alexander Ross , Dennis Charles Abts , John Thompson , Gregory M. Thorson

IPC分类号： G06F12/02 , G06F3/06

摘要： A processor having a functional slice architecture is divided into a plurality of functional units (“tiles”) organized into a plurality of slices. Each slice is configured to perform specific functions within the processor, which may include memory slices (MEM) for storing operand data, and arithmetic logic slices for performing operations on received operand data. The tiles of the processor are configured to stream operand data across a first dimension, and receive instructions across a second dimension orthogonal to the first dimension. The timing of data and instruction flows are configured such that corresponding data and instructions are received at each tile with a predetermined temporal relationship, allowing operand data to be transmitted between the slices of the processor without any accompanying metadata. Instead, each slice is able to determine what operations to perform on received data based upon the timing at which the data is received.

10.

发明授权
Expanded kernel generation 有权

公开(公告)号：US11204976B2

公开(公告)日：2021-12-21

申请号：US16686864

申请日：2019-11-18

申请人： Groq, Inc.

发明人： Jonathan Alexander Ross , Tom Hawkins , Gregory Michael Thorson , Matt Boyd

IPC分类号： G06F17/16 , G06N20/10 , G06N3/08 , G06F7/76 , G06N7/00 , G06F7/544 , G06F9/54 , G06K9/62 , G06F17/15 , G06N3/04

摘要： A method comprises receiving a kernel used to convolve with an input tensor. For a first dimension of the kernel, a square block of values for each single dimensional vector of the kernel that includes all rotations of that single dimensional vector is generated. For each additional dimension of the kernel, group blocks of an immediately preceding dimension into sets of blocks, each set of blocks including blocks of the immediately preceding dimension that are aligned along a vector that is parallel to the axis of the dimension; and generate, for the additional dimension, one or more blocks of values, each block including all rotations of blocks within each of the sets of blocks of the immediately preceding dimension. The block of values corresponding to the last dimension in the additional dimensions of the kernel is output as the expanded kernel.

搜索结果

国家/区域

专利有效性

申请日

公布(公告)日

申请人

申请人所在国/区域

发明人

IPC

IPC部

IPC大类

IPC小类

IPC大组

IPC小组

外观分类