-
公开(公告)号:US20190121388A1
公开(公告)日:2019-04-25
申请号:US15886053
申请日:2018-02-01
Applicant: Graphcore Limited
Inventor: Simon Christian Knowles , Daniel John Pelham Wilkinson , Richard Luke Southwell Osborne , Alan Graham Alexander , Stephen Felix , Jonathan Mangnall , David Lacey
Abstract: The invention relates to a computer implemented method of generating multiple programs to deliver a computerised function, each program to be executed in a processing unit of a computer comprising a plurality of processing units each having instruction storage for holding a local program, an execution unit for executing the local program and data storage for holding data, a switching fabric connected to an output interface of each processing unit and connectable to an input interface of each processing unit by switching circuitry controllable by each processing unit, and a synchronisation module operable to generate a synchronisation signal, the method comprising: generating a local program for each processing unit comprising a sequence of executable instructions; determining for each processing unit a relative time of execution of instructions of each local program whereby a local program allocated to one processing unit is scheduled to execute with a predetermined delay relative to a synchronisation signal a send instruction to transmit at least one data packet at a predetermined transmit time, relative to the synchronisation signal, destined for a recipient processing unit but having no destination identifier, and a local program allocated to the recipient processing unit is scheduled to execute at a predetermined switch time a switch control instruction to control the switching circuitry to connect its processing unit wire to the switching fabric to receive the data packet at a receive time.
-
公开(公告)号:US12141092B2
公开(公告)日:2024-11-12
申请号:US17658124
申请日:2022-04-06
Applicant: Graphcore Limited
Inventor: Simon Christian Knowles , Daniel John Pelham Wilkinson , Richard Luke Southwell Osborne , Alan Graham Alexander , Stephen Felix , Jonathan Mangnall , David Lacey
IPC: G06F15/173 , G06F8/41 , G06F9/38 , G06F15/80
Abstract: The invention relates to a computer program comprising a sequence of instructions for execution on a processing unit having instruction storage for holding the computer program, an execution unit for executing the computer program and data storage for holding data, the computer program comprising one or more computer executable instruction which, when executed, implements: a send function which causes a data packet destined for a recipient processing unit to be transmitted on a set of connection wires connected to the processing unit, the data packet having no destination identifier but being transmitted at a predetermined transmit time; and a switch control function which causes the processing unit to control switching circuitry to connect a set of connection wires of the processing unit to a switching fabric to receive a data packet at a predetermined receive time.
-
33.
公开(公告)号:US11900109B2
公开(公告)日:2024-02-13
申请号:US15886331
申请日:2018-02-01
Applicant: Graphcore Limited
Inventor: Stephen Felix , Simon Christian Knowles , Godfrey Da Costa
CPC classification number: G06F9/30036 , G06F9/30018
Abstract: The present invention relates to an execution unit for executing a computer program comprising a sequence of instructions, which include a masking instruction. The execution unit is configured to execute the masking instruction which, when executed by the execution unit, masks randomly selected values from a source operand of n values and retains other original values from the source operand to generate a result which includes original values from the source operand and symbols in place of the selected values.
-
34.
公开(公告)号:US11467833B2
公开(公告)日:2022-10-11
申请号:US16276872
申请日:2019-02-15
Applicant: Graphcore Limited
Abstract: A processor having an instruction set including a load-store instruction having operands specifying, from amongst the registers in at least one register file, a respective destination of each of two load operations, a respective source of a store operation, and a pair of address registers arranged to hold three memory addresses, the three memory addresses being a respective load address for each of the two load operations and a respective store address for the store operation. The load-store instruction further includes three stride operands each specifying a respective stride value for each of the two load addresses and one store address, wherein at least some possible values of each stride operand specify the respective stride value by specifying one of a plurality of fields within a stride register in one of the one or more register files, each field holding a different stride value.
-
公开(公告)号:US11269806B2
公开(公告)日:2022-03-08
申请号:US16419535
申请日:2019-05-22
Applicant: Graphcore Limited
Inventor: Stephen Felix , Simon Christian Knowles
Abstract: A time deterministic computer is architected so that exchange code compiled for one set of tiles, e.g., a column, can be reused on other sets. The computer comprises: a plurality of processing units each having an input interface with a set of input wires, and an output interface with a set of output wires; a switching fabric connected to each of the processing units by the respective set of output wires and connectable to each of the processing units by the respective set of output wires and connectable to each of the processing units by the respective input wires via switching circuitry controllable by its associated processing unit; the processing units arranged in columns, each column having a base processing unit proximate the switching fabric and multiple processing units one adjacent the other in respective positions in the direction of the column, wherein to implement exchange of data between the processing units at least one processing unit is configured to transmit at a transmit time a data packet intended for a recipient processing unit onto its output set of connection wires, the data packet having no destination identifier of the recipient processing unit but destined for receipt at the recipient processing unit with a predetermined delay relative to the transmit time, wherein the predetermined delay is dependent on an exchange pathway between the transmitting and recipient processing units, wherein the exchange pathway between any pair of transmitting and recipient processing unit at respective positions in one column has the same delay as the exchange pathway between each pair of transmitting and recipient processing units at corresponding respective positions in the other columns.
-
公开(公告)号:US11262787B2
公开(公告)日:2022-03-01
申请号:US16744249
申请日:2020-01-16
Applicant: Graphcore Limited
Inventor: Simon Christian Knowles , Daniel John Pelham Wilkinson , Richard Luke Southwell Osborne , Alan Graham Alexander , Stephen Felix , Jonathan Mangnall , David Lacey
Abstract: The invention relates to a computer implemented method of generating multiple programs to deliver a computerised function, each program to be executed in a processing unit of a computer comprising a plurality of processing units each having instruction storage for holding a local program, an execution unit for executing the local program and data storage for holding data, a switching fabric connected to an output interface of each processing unit and connectable to an input interface of each processing unit by switching circuitry controllable by each processing unit, and a synchronisation module operable to generate a synchronisation signal, the method comprising: generating a local program for each processing unit comprising a sequence of executable instructions; determining for each processing unit a relative time of execution of instructions of each local program whereby a local program allocated to one processing unit is scheduled to execute with a predetermined delay relative to a synchronisation signal a send instruction to transmit at least one data packet at a predetermined transmit time, relative to the synchronisation signal, destined for a recipient processing unit but having no destination identifier, and a local program allocated to the recipient processing unit is scheduled to execute at a predetermined switch time a switch control instruction to control the switching circuitry to connect its processing unit wire to the switching fabric to receive the data packet at a receive time.
-
公开(公告)号:US11061679B2
公开(公告)日:2021-07-13
申请号:US16389682
申请日:2019-04-19
Applicant: Graphcore Limited
Abstract: A processor comprising an execution unit, memory and one or more register files. The execution unit is configured to execute instances of machine code instructions from an instruction set. The types of instruction defined in the instruction set include a double-load instruction for loading from the memory to at least one of the one or more register files. The execution unit is configured so as, when the load instruction is executed, to perform a first load operation strided by a fixed stride, and a second load operation strided by a variable stride, the variable stride being specified in a variable stride register in one of the one or more register files.
-
公开(公告)号:US11023239B2
公开(公告)日:2021-06-01
申请号:US16389682
申请日:2019-04-19
Applicant: Graphcore Limited
Abstract: A processor comprising an execution unit, memory and one or more register files. The execution unit is configured to execute instances of machine code instructions from an instruction set. The types of instruction defined in the instruction set include a double-load instruction for loading from the memory to at least one of the one or more register files. The execution unit is configured so as, when the load instruction is executed, to perform a first load operation strided by a fixed stride, and a second load operation strided by a variable stride, the variable stride being specified in a variable stride register in one of the one or more register files.
-
公开(公告)号:US10956165B2
公开(公告)日:2021-03-23
申请号:US15885925
申请日:2018-02-01
Applicant: Graphcore Limited
Inventor: Simon Christian Knowles
Abstract: A processor comprising: an execution unit for executing a respective thread in each of a repeating sequence of time slots; and a plurality of context register sets, each comprising a respective set of registers for representing a state of a respective thread. The context register sets comprise a respective worker context register set for each of the number of time slots the execution unit is operable to interleave, and at least one extra context register set. The worker context register sets represent the respective states of worker threads and the extra context register set being represents the state of a supervisor thread. The processor is configured to begin running the supervisor thread in each of the time slots, and to enable the supervisor thread to then individually relinquish each of the time slots in which it is running to a respective one of the worker threads.
-
公开(公告)号:US10936008B2
公开(公告)日:2021-03-02
申请号:US15886009
申请日:2018-02-01
Applicant: Graphcore Limited
Inventor: Simon Christian Knowles , Daniel John Pelham Wilkinson , Richard Luke Southwell Osborne , Alan Graham Alexander , Stephen Felix , Jonathan Mangnall , David Lacey
Abstract: The invention relates to a computer comprising: a plurality of processing units each having instruction storage holding a local program, an execution unit executing the local program, data storage for holding data; an input interface with a set of input wires, and an output interface with a set of output wires; a switching fabric connected to each of the processing units by the respective set of output wires and connectable to each of the processing units by the respective input wires via switching circuitry controllable by each processing unit; a synchronisation module operable to generate a synchronisation signal to control the computer to switch between a compute phase and an exchange phase, wherein the processing units are configured to execute their local programs according to a common clock, the local programs being such that in the exchange phase at least one processing unit executes a send instruction from its local program to transmit at a transmit time a data packet onto its output set of connection wires, the data packet being destined for at least one recipient processing unit but having no destination identifier, and at a predetermined switch time the recipient processing unit executes a switch control instruction from its local program to control its switching circuitry to connect its input set of wires to the switching fabric to receive the data packet at a receive time, the transmit time and, switch time and receive time being governed by the common clock with respect to the synchronisation signal.
-
-
-
-
-
-
-
-
-