Patent search ap:("Graphcore Limited") AND inv:"Simon Christian Knowles" Page 2

11.

发明授权
Repeat instruction for loading and/or executing code in a claimable repeat cache a specified number of times 有权

公开(公告)号：US11567768B2

公开(公告)日：2023-01-31

申请号：US16276895

申请日：2019-02-15

Applicant: Graphcore Limited

Inventor： Alan Graham Alexander , Simon Christian Knowles , Mrudula Chidambar Gore , Jonathan Louis Ferguson

IPC: G06F9/30 , G06F9/38 , G06F12/0875

Abstract: A processor is disclosed including: a barrel-threaded execution unit for executing concurrent threads, and a repeat cache shared between the concurrent threads. The processor's instruction set includes a repeat instruction which takes a repeat count operand. When the repeat cache is not claimed and the repeat instruction is executed in a first thread, a portion of code is cached from the first thread into the repeat cache, the state of the repeat cache is changed to record it as claimed, and the cached code is executed a number of times. When the repeat instruction is then executed in a further thread, then the already-cached portion of code is again executed a respective number of times, each time from the repeat cache. For each of the first and further instructions, the repeat count operand in the respective instruction specifies the number of times to execute the cached code.

12.

发明授权
Data exchange pathways between pairs of processing units in columns in a computer 有权

公开(公告)号：US11561926B2

公开(公告)日：2023-01-24

申请号：US17648517

申请日：2022-01-20

Applicant: Graphcore Limited

Inventor： Stephen Felix , Simon Christian Knowles

IPC: G06F15/80 , G06F9/30 , G06F9/52

Abstract: A time deterministic computer is architected so that exchange code compiled for one set of tiles, e.g., a column, can be reused on other sets. The computer comprises: a plurality of processing units each having an input interface with a set of input wires, and an output interface with a set of output wires: a switching fabric connected to each of the processing units by the respective set of output wires and connectable to each of the processing units by the respective set of output wires and connectable to each of the processing units by the respective input wires via switching circuitry controllable by its associated processing unit; the processing units arranged in columns, each column having a base processing unit proximate the switching fabric and multiple processing units one adjacent the other in respective positions in the direction of the column.

13.

发明授权
Synchronization amongst processor tiles 有权

公开(公告)号：US11023290B2

公开(公告)日：2021-06-01

申请号：US15885972

申请日：2018-02-01

Applicant: Graphcore Limited

Inventor： Daniel John Pelham Wilkinson , Simon Christian Knowles , Matthew David Fyles , Alan Graham Alexander , Stephen Felix

IPC: G06F9/30 , G06F9/52 , G06F9/38 , G06F15/80 , G06N20/00 , G06F9/48

Abstract: A processing system comprising an arrangement of tiles and an interconnect between the tiles. The interconnect comprises synchronization logic for coordinating a barrier synchronization to be performed between a group of the tiles. The instruction set comprises a synchronization instruction taking an operand which selects one of a plurality of available modes each specifying a different membership of the group. Execution of the synchronization instruction cause a synchronization request to be transmitted from the respective tile to the synchronization logic, and instruction issue to be suspended on the respective tile pending a synchronization acknowledgement being received back from the synchronization logic. In response to receiving the synchronization request from all the tiles in the group as specified by the operand of the synchronization instruction, the synchronization logic returns the synchronization acknowledgment to the tiles in the specified group.

14.

发明授权
Synchronization in a multi-tile processing array 有权

公开(公告)号：US10963003B2

公开(公告)日：2021-03-30

申请号：US16165978

申请日：2018-10-19

Applicant: Graphcore Limited

Inventor： Simon Christian Knowles , Daniel John Pelham Wilkinson , Richard Luke Southwell Osborne , Alan Graham Alexander , Stephen Felix , Jonathan Mangnall , David Lacey

IPC: G06F1/12 , G06F9/30 , G06N20/00 , G06N5/02 , G06N5/04

Abstract: The invention relates to a computer comprising: a plurality of processing units each having instruction storage holding a local program, an execution unit executing the local program, data storage for holding data; an input interface with a set of input wires, and an output interface with a set of output wires; a switching fabric connected to each of the processing units by the respective set of output wires and connectable to each of the processing units by the respective input wires via switching circuitry controllable by each processing unit; a synchronisation module operable to generate a synchronisation signal to control the computer to switch between a compute phase and an exchange phase, wherein the processing units are configured to execute their local programs according to a common clock, the local programs being such that in the exchange phase at least one processing unit executes a send instruction from its local program to transmit at a transmit time a data packet onto its output set of connection wires, the data packet being destined for at least one recipient processing unit but having no destination identifier, and at a predetermined switch time the recipient processing unit executes a switch control instruction from its local program to control its switching circuitry to connect its input set of wires to the switching fabric to receive the data packet at a receive time, the transmit time and, switch time and receive time being governed by the common clock with respect to the synchronisation signal.

15.

发明授权
Compiler method 有权

公开(公告)号：US10802536B2

公开(公告)日：2020-10-13

申请号：US15886053

申请日：2018-02-01

Applicant: Graphcore Limited

Inventor： Simon Christian Knowles , Daniel John Pelham Wilkinson , Richard Luke Southwell Osborne , Alan Graham Alexander , Stephen Felix , Jonathan Mangnall , David Lacey

IPC: G06F1/12 , G06F9/52 , G06F9/30 , G06F9/54 , G06F9/38 , G06F15/173 , G06N20/00

Abstract: The invention relates to a computer implemented method of generating multiple programs to deliver a computerised function, each program to be executed in a processing unit of a computer comprising a plurality of processing units each having instruction storage for holding a local program, an execution unit for executing the local program and data storage for holding data, a switching fabric connected to an output interface of each processing unit and connectable to an input interface of each processing unit by switching circuitry controllable by each processing unit, and a synchronisation module operable to generate a synchronisation signal, the method comprising: generating a local program for each processing unit comprising a sequence of executable instructions; determining for each processing unit a relative time of execution of instructions of each local program whereby a local program allocated to one processing unit is scheduled to execute with a predetermined delay relative to a synchronisation signal a send instruction to transmit at least one data packet at a predetermined transmit time, relative to the synchronisation signal, destined for a recipient processing unit but having no destination identifier, and a local program allocated to the recipient processing unit is scheduled to execute at a predetermined switch time a switch control instruction to control the switching circuitry to connect its processing unit wire to the switching fabric to receive the data packet at a receive time.

16.

发明申请
DOUBLE LOAD INSTRUCTION 审中-公开

公开(公告)号：US20200233670A1

公开(公告)日：2020-07-23

申请号：US16389682

申请日：2019-04-19

Applicant: Graphcore Limited

Inventor： Alan Graham Alexander , Simon Christian Knowles , Mrudula Chidambar Gore

IPC: G06F9/345 , G06F9/30 , G06N3/04

Abstract: A processor comprising an execution unit, memory and one or more register files. The execution unit is configured to execute instances of machine code instructions from an instruction set. The types of instruction defined in the instruction set include a double-load instruction for loading from the memory to at least one of the one or more register files. The execution unit is configured so as, when the load instruction is executed, to perform a first load operation strided by a fixed stride, and a second load operation strided by a variable stride, the variable stride being specified in a variable stride register in one of the one or more register files.

17.

发明申请
LOAD-STORE INSTRUCTION 审中-公开

公开(公告)号：US20200210187A1

公开(公告)日：2020-07-02

申请号：US16276872

申请日：2019-02-15

Applicant: Graphcore Limited

Inventor： Alan Graham Alexander , Simon Christian Knowles , Mrudula Chidambar Gore

IPC: G06F9/345 , G06F9/30 , G06F9/38 , G06F17/16 , G06N3/02

Abstract: A processor having an instruction set including a load-store instruction having operands specifying, from amongst the registers in at least one register file, a respective destination of each of two load operations, a respective source of a store operation, and a pair of address registers arranged to hold three memory addresses, the three memory addresses being a respective load address for each the two load operations and a respective store address for the store operation. The load-store instruction further includes three immediate stride operands each specifying a respective stride value for each of the two load addresses and one store address, wherein at least some possible values of each immediate stride operand specify the respective stride value by specifying one of a plurality of fields within a stride register in one of the one or more register files, each field holding a different stride value.

18.

发明申请
Synchronization in a Multi-Tile Processing Arrangement 审中-公开

公开(公告)号：US20200089499A1

公开(公告)日：2020-03-19

申请号：US16688305

申请日：2019-11-19

Applicant: Graphcore Limited

Inventor： Simon Christian Knowles , Alan Graham Alexander

IPC: G06F9/30 , G06F9/38 , G06F15/80 , G06F15/173 , G06F9/46 , G06F9/52 , G06N20/00 , G06F9/48

Abstract: A processing system comprising multiple tiles and an interconnect between the tiles. The interconnect is used to communicate between a group of some or all of the tiles according to a bulk synchronous parallel scheme, whereby each tile in the group performs an on-tile compute phase followed by an inter-tile exchange phase with the exchange phase being held back until all tiles in the group have completed the compute phase. Each tile in the group has a local exit state upon completion of the compute phase. The instruction set comprises a synchronization instruction for execution by each tile upon completion of its compute phase to signal a sync request to logic in the interconnect. In response to receiving the sync request from all the tiles in the group, the logic releases the next exchange phase and also makes available an aggregated a state of all the tiles in the group.

19.

发明授权
Parallel computing 有权

公开(公告)号：US10585716B2

公开(公告)日：2020-03-10

申请号：US15885949

申请日：2018-02-01

Applicant: Graphcore Limited

Inventor： Simon Christian Knowles

IPC: G06F9/46 , G06F9/52 , G06F13/16 , G06F9/48

Abstract: A method for executing a computer program, the method implemented by a processor comprising a plural number of computing units and an interconnect connected to the computing units, wherein each computing unit comprises a processing unit and a memory having at least two memory ports, each port assignable to one or more respective regions of the memory, wherein the method comprises at each computing unit: performing an initial step of the program to write: an initial output value to an output region of the memory, and an initial input value to an input region of the memory; and performing a subsequent step of the program by: in a compute phase: assigning one of the two ports to both the input region and the output region; executing code sequences on the processing unit to compute an output set of one or more new output values, and writing the output set to the output region, the output set computed from the initial output and initial input values, each of which is retrieved via said one port in the compute phase; when the compute phase has completed, in an exchange phase: assigning a first of the two ports to the output region and a second of the two ports to input region; and retrieving a new output value of the output set from the output region via said first port and sending the retrieved value to a different computing unit via the interconnect, and receiving via the interconnect a new input value which has been computed by a different computing unit in the subsequent step and writing the received value to the input region via said second port.

20.

发明申请
SYNCHRONIZATION IN A MULTI-TILE, MULTI-CHIP PROCESSING ARRANGEMENT 审中-公开

公开(公告)号：US20190121784A1

公开(公告)日：2019-04-25

申请号：US15886138

申请日：2018-02-01

Applicant: Graphcore Limited

Inventor： Daniel John Pelham Wilkinson , Stephen Felix , Richard Luke Southwell Osborne , Simon Christian Knowles , Alan Graham Alexander , Ian James Quinn

IPC: G06F15/80 , G06F9/52 , G06F15/173

Abstract: A method of operating a system comprising multiple processor tiles divided into a plurality of domains wherein within each domain the tiles are connected to one another via a respective instance of a time-deterministic interconnect and between domains the tiles are connected to one another via a non-time-deterministic interconnect. The method comprises: performing a compute stage, then performing a respective internal barrier synchronization within each domain, then performing an internal exchange phase within each domain, then performing an external barrier synchronization to synchronize between different domains, then performing an external exchange phase between the domains.

Search Results

Country/Region

Patent validity

Application date

Publication (announcement) day

applicant

The country/region where the applicant is located

Inventor

IPC

IPC Department

IPC class

IPC subclass

IPC group

IPC team

Appearance classification