-
公开(公告)号:US11709794B2
公开(公告)日:2023-07-25
申请号:US17451372
申请日:2021-10-19
Applicant: Graphcore Limited
Inventor: Stephen Felix , Richard Luke Southwell Osborne , Alan Graham Alexander
Abstract: Two or more die are stacked together in a stacked integrated circuit device. Each of the processors on these die is able to communicate with other processors on its die by sending data over the switching fabric of its respective die. The mechanism for sending data between processors on the same die (i.e. intradie communication) is reused for sending data between processors on different die (i.e. interdie communication). The reuse of the mechanism is enabled by assigning each processor a vertical neighbour on its opposing die. Each processor has an interdie connection that connects it to the output exchange bus of its neighbour. A processor is able to borrow the output exchange bus of its neighbour by sending data along the output exchange bus of its neighbour.
-
公开(公告)号:US10963003B2
公开(公告)日:2021-03-30
申请号:US16165978
申请日:2018-10-19
Applicant: Graphcore Limited
Inventor: Simon Christian Knowles , Daniel John Pelham Wilkinson , Richard Luke Southwell Osborne , Alan Graham Alexander , Stephen Felix , Jonathan Mangnall , David Lacey
Abstract: The invention relates to a computer comprising: a plurality of processing units each having instruction storage holding a local program, an execution unit executing the local program, data storage for holding data; an input interface with a set of input wires, and an output interface with a set of output wires; a switching fabric connected to each of the processing units by the respective set of output wires and connectable to each of the processing units by the respective input wires via switching circuitry controllable by each processing unit; a synchronisation module operable to generate a synchronisation signal to control the computer to switch between a compute phase and an exchange phase, wherein the processing units are configured to execute their local programs according to a common clock, the local programs being such that in the exchange phase at least one processing unit executes a send instruction from its local program to transmit at a transmit time a data packet onto its output set of connection wires, the data packet being destined for at least one recipient processing unit but having no destination identifier, and at a predetermined switch time the recipient processing unit executes a switch control instruction from its local program to control its switching circuitry to connect its input set of wires to the switching fabric to receive the data packet at a receive time, the transmit time and, switch time and receive time being governed by the common clock with respect to the synchronisation signal.
-
公开(公告)号:US10949266B2
公开(公告)日:2021-03-16
申请号:US16538980
申请日:2019-08-13
Applicant: Graphcore Limited
Inventor: David Lacey , Daniel John Pelham Wilkinson , Richard Luke Southwell Osborne , Matthew David Fyles
IPC: G06F16/901 , G06F9/30 , G06F15/167 , G06F15/173 , G06F9/48 , G06F9/52 , G06F9/38 , G06F9/54 , H04L12/801 , H04L29/06 , H04L29/08
Abstract: A system comprising: a first subsystem comprising one or more first processors, and a second subsystem comprising one or more second processors. The second subsystem is configured to process code over a series of steps delineated by barrier synchronizations, and in a current step, to send a descriptor to the first subsystem specifying a value of each of one or more parameters of each of one or more interactions that the second subsystem is programmed to perform with the first subsystem via an inter-processor interconnect in a subsequent step. The first subsystem is configured to execute a portion of code to perform one or more preparatory operations, based on the specified values of at least one of the one or more parameters of each interaction as specified by the descriptor, to prepare for said one or more interactions prior to the barrier synchronization leading into the subsequent phase.
-
公开(公告)号:US10802536B2
公开(公告)日:2020-10-13
申请号:US15886053
申请日:2018-02-01
Applicant: Graphcore Limited
Inventor: Simon Christian Knowles , Daniel John Pelham Wilkinson , Richard Luke Southwell Osborne , Alan Graham Alexander , Stephen Felix , Jonathan Mangnall , David Lacey
Abstract: The invention relates to a computer implemented method of generating multiple programs to deliver a computerised function, each program to be executed in a processing unit of a computer comprising a plurality of processing units each having instruction storage for holding a local program, an execution unit for executing the local program and data storage for holding data, a switching fabric connected to an output interface of each processing unit and connectable to an input interface of each processing unit by switching circuitry controllable by each processing unit, and a synchronisation module operable to generate a synchronisation signal, the method comprising: generating a local program for each processing unit comprising a sequence of executable instructions; determining for each processing unit a relative time of execution of instructions of each local program whereby a local program allocated to one processing unit is scheduled to execute with a predetermined delay relative to a synchronisation signal a send instruction to transmit at least one data packet at a predetermined transmit time, relative to the synchronisation signal, destined for a recipient processing unit but having no destination identifier, and a local program allocated to the recipient processing unit is scheduled to execute at a predetermined switch time a switch control instruction to control the switching circuitry to connect its processing unit wire to the switching fabric to receive the data packet at a receive time.
-
公开(公告)号:US20200012534A1
公开(公告)日:2020-01-09
申请号:US16235515
申请日:2018-12-28
Applicant: Graphcore Limited
Inventor: Ola Tørudbakken , Daniel John Pelham Wilkinson , Richard Luke Southwell Osborne , Brian Manula , Harald Høeg
Abstract: A gateway for interfacing a host with a subsystem for acting as a work accelerator to the host. The gateway enables the transfer of batches of data to the subsystem at precompiled data exchange synchronisation points. The gateway comprises a streaming engine having a data mover engine and a memory management engine, the data mover engine and memory management engine being configured to execute instructions in coordination from work descriptors. The memory management engine is configured to execute instructions from the work descriptor to transfer data between external storage and the local memory associated with the gateway. The data mover engine is configured to execute instructions from the work descriptor to transfer data between the local memory associated with the gateway and the subsystem.
-
公开(公告)号:US20190121785A1
公开(公告)日:2019-04-25
申请号:US15886185
申请日:2018-02-01
Applicant: Graphcore Limited
Inventor: Daniel John Pelham Wilkinson , Richard Luke Southwell Osborne , Matthew David Fyles , Alan Graham Alexander , Stephen Felix
IPC: G06F15/80 , G06F9/52 , G06F15/173
Abstract: A processing system comprising an arrangement of tiles and synchronization logic in the form of hardware logic for coordinating between a group of some or all of said tiles. The instruction set comprises a synchronization instruction which causes an instance of a synchronization request to be transmitted from the respective tile to the synchronization logic, and suspends instruction issue on the respective tile pending a synchronization acknowledgement. In response to receiving an instance of the synchronization request from all of the tiles of the group, the synchronization logic returns the synchronization acknowledgment back to each of the tiles in the group to allow the instruction issue to resume. The instruction set further comprises an abstain instruction, which sends an instance of the synchronization request but does not suspend instruction issue on the respective tile pending the synchronization acknowledgement, instead allowing the instruction issue on the respective tile to continue.
-
公开(公告)号:US20190121784A1
公开(公告)日:2019-04-25
申请号:US15886138
申请日:2018-02-01
Applicant: Graphcore Limited
Inventor: Daniel John Pelham Wilkinson , Stephen Felix , Richard Luke Southwell Osborne , Simon Christian Knowles , Alan Graham Alexander , Ian James Quinn
IPC: G06F15/80 , G06F9/52 , G06F15/173
Abstract: A method of operating a system comprising multiple processor tiles divided into a plurality of domains wherein within each domain the tiles are connected to one another via a respective instance of a time-deterministic interconnect and between domains the tiles are connected to one another via a non-time-deterministic interconnect. The method comprises: performing a compute stage, then performing a respective internal barrier synchronization within each domain, then performing an internal exchange phase within each domain, then performing an external barrier synchronization to synchronize between different domains, then performing an external exchange phase between the domains.
-
公开(公告)号:US11176066B2
公开(公告)日:2021-11-16
申请号:US16395846
申请日:2019-04-26
Applicant: Graphcore Limited
Inventor: Richard Luke Southwell Osborne , Stephen Felix
Abstract: The present disclosure relates to a method of scheduling messages to be exchanged between tiles in a computer where there is a fixed transmission time between sending and receiving tiles. According to the method a total size of message data to be sent or received by each tile is determined. One of the tiles is selected based at least on the size of the message data to schedule a first message. The first message to be scheduled is selected from the set of messages on that tile. In order to schedule the message the other end points of this selected message are determined, and then respective time slots are allocated at the sending and receiving tiles for that message. The size of the selected message is then deducted from each of the tiles acting as end points for the message, and then the sequence is carried out again until all messages have been scheduled. This technique optimises message exchange in an exchange phase of a BSP system.
-
公开(公告)号:US10963315B2
公开(公告)日:2021-03-30
申请号:US16276834
申请日:2019-02-15
Applicant: Graphcore Limited
Inventor: David Lacey , Daniel John Pelham Wilkinson , Richard Luke Southwell Osborne , Matthew David Fyles
IPC: G06F9/52 , G06F9/30 , G06F9/38 , G06F9/54 , G06F9/48 , G06F16/901 , G06F15/167 , G06F15/173 , H04L12/801 , H04L29/06 , H04L29/08
Abstract: A system comprising: a first subsystem comprising one or more first processors, and a second subsystem comprising one or more second processors. The second subsystem is configured to process code over a series of steps delineated by barrier synchronizations, and in a current step, to send a descriptor to the first subsystem specifying a value of each of one or more parameters of each of one or more interactions that the second subsystem is programmed to perform with the first subsystem via an inter-processor interconnect in a subsequent step. The first subsystem is configured to execute a portion of code to perform one or more preparatory operations, based on the specified values of at least one of the one or more parameters of each interaction as specified by the descriptor, to prepare for said one or more interactions prior to the barrier synchronization leading into the subsequent phase.
-
公开(公告)号:US10817444B2
公开(公告)日:2020-10-27
申请号:US16525833
申请日:2019-07-30
Applicant: Graphcore Limited
Inventor: Daniel John Pelham Wilkinson , Richard Luke Southwell Osborne , Stephen Felix , Graham Bernard Cunningham , Alan Graham Alexander
Abstract: A system comprising an arrangement of multiple processor modules, and an external interconnect for communicating data in the form of packets to outside the arrangement. The interconnect comprises an exchange block configured to provide flow control. One of the processor modules is arranged to send an exchange request message to the exchange block on behalf of others with data to send outside the arrangement. The exchange block sends an exchange-on message to a first of these processor modules, to cause the first module to start sending packets via the interconnect. Then, once this processor module has sent its last data packet, the exchange block sends an exchange-off message to this processor module to cause it to stop sending packets, and sends another exchange-on message to the next processor module with data to send, and so forth.
-
-
-
-
-
-
-
-
-