-
公开(公告)号:US11593185B2
公开(公告)日:2023-02-28
申请号:US16688305
申请日:2019-11-19
Applicant: Graphcore Limited
Inventor: Simon Christian Knowles , Alan Graham Alexander
IPC: G06F9/44 , G06F9/52 , G06F9/30 , G06F9/38 , G06F15/80 , G06F15/173 , G06F9/46 , G06N20/00 , G06F9/48
Abstract: A processing system comprising multiple tiles and an interconnect between the tiles. The interconnect is used to communicate between a group of some or all of the tiles according to a bulk synchronous parallel scheme, whereby each tile in the group performs an on-tile compute phase followed by an inter-tile exchange phase with the exchange phase being held back until all tiles in the group have completed the compute phase. Each tile in the group has a local exit state upon completion of the compute phase. The instruction set comprises a synchronization instruction for execution by each tile upon completion of its compute phase to signal a sync request to logic in the interconnect. In response to receiving the sync request from all the tiles in the group, the logic releases the next exchange phase and also makes available an aggregated a state of all the tiles in the group.
-
公开(公告)号:US11586483B2
公开(公告)日:2023-02-21
申请号:US17320904
申请日:2021-05-14
Applicant: Graphcore Limited
Inventor: Daniel John Pelham Wilkinson , Simon Christian Knowles , Matthew David Fyles , Alan Graham Alexander , Stephen Felix
Abstract: A processing system comprising an arrangement of tiles and an interconnect between the tiles. The interconnect comprises synchronization logic for coordinating a barrier synchronization to be performed between a group of the tiles. The instruction set comprises a synchronization instruction taking an operand which selects one of a plurality of available modes each specifying a different membership of the group. Execution of the synchronization instruction cause a synchronization request to be transmitted from the respective tile to the synchronization logic, and instruction issue to be suspended on the respective tile pending a synchronization acknowledgement being received back from the synchronization logic. In response to receiving the synchronization request from all the tiles in the group as specified by the operand of the synchronization instruction, the synchronization logic returns the synchronization acknowledgment to the tiles in the specified group.
-
公开(公告)号:US11321272B2
公开(公告)日:2022-05-03
申请号:US15886131
申请日:2018-02-01
Applicant: Graphcore Limited
Inventor: Simon Christian Knowles , Daniel John Pelham Wilkinson , Richard Luke Southwell Osborne , Alan Graham Alexander , Stephen Felix , Jonathan Mangnall , David Lacey
IPC: G06F15/173 , G06F8/41 , G06F9/38 , G06F15/80
Abstract: The invention relates to a computer program comprising a sequence of instructions for execution on a processing unit having instruction storage for holding the computer program, an execution unit for executing the computer program and data storage for holding data, the computer program comprising one or more computer executable instruction which, when executed, implements: a send function which causes a data packet destined for a recipient processing unit to be transmitted on a set of connection wires connected to the processing unit, the data packet having no destination identifier but being transmitted at a predetermined transmit time; and a switch control function which causes the processing unit to control switching circuitry to connect a set of connection wires of the processing unit to a switching fabric to receive a data packet at a predetermined receive time.
-
公开(公告)号:US20210271527A1
公开(公告)日:2021-09-02
申请号:US17320904
申请日:2021-05-14
Applicant: Graphcore Limited
Inventor: Daniel John Pelham Wilkinson , Simon Christian Knowles , Matthew David Fyles , Alan Graham Alexander , Stephen Felix
Abstract: A processing system comprising an arrangement of tiles and an interconnect between the tiles. The interconnect comprises synchronization logic for coordinating a barrier synchronization to be performed between a group of the tiles. The instruction set comprises a synchronization instruction taking an operand which selects one of a plurality of available modes each specifying a different membership of the group. Execution of the synchronization instruction cause a synchronization request to be transmitted from the respective tile to the synchronization logic, and instruction issue to be suspended on the respective tile pending a synchronization acknowledgement being received back from the synchronization logic. In response to receiving the synchronization request from all the tiles in the group as specified by the operand of the synchronization instruction, the synchronization logic returns the synchronization acknowledgment to the tiles in the specified group.
-
公开(公告)号:US11023413B2
公开(公告)日:2021-06-01
申请号:US16725313
申请日:2019-12-23
Applicant: Graphcore Limited
Inventor: Daniel John Pelham Wilkinson , Stephen Felix , Richard Luke Southwell Osborne , Simon Christian Knowles , Alan Graham Alexander , Ian James Quinn
IPC: G06F15/80 , G06F9/52 , G06F15/173
Abstract: A method of operating a system comprising multiple processor tiles divided into a plurality of domains wherein within each domain the tiles are connected to one another via a respective instance of a time-deterministic interconnect and between domains the tiles are connected to one another via a non-time-deterministic interconnect. The method comprises: performing a compute stage, then performing a respective internal barrier synchronization within each domain, then performing an internal exchange phase within each domain, then performing an external barrier synchronization to synchronize between different domains, then performing an external exchange phase between the domains.
-
公开(公告)号:US20190354370A1
公开(公告)日:2019-11-21
申请号:US16529122
申请日:2019-08-01
Applicant: Graphcore Limited
Inventor: Simon Christian Knowles
Abstract: A processing apparatus comprising one or more processing modules, each comprising an execution unit. The one or more processing modules are operable to run a plurality of parallel or concurrent threads, and the processing apparatus further comprises a storage location for storing an aggregated exit state of the plurality of threads. An instruction set of the processing apparatus comprises an exit instruction for inclusion in each of the plurality of threads, the exit state instruction taking an individual exit state of the respective thread as an operand. The exit instruction terminates the respective thread and also causes the individual exit state specified in the operand to contribute to the aggregated exit state.
-
公开(公告)号:US10452396B2
公开(公告)日:2019-10-22
申请号:US15885955
申请日:2018-02-01
Applicant: Graphcore Limited
Inventor: Simon Christian Knowles
Abstract: A processor comprising: an execution unit, multiple context register sets, a scheduler arranged to control the execution unit to provide a repeating sequence of temporally interleaved time slots, thereby enabling at least one respective worker thread to be allocated for execution in each respective one of some or all of the time slots, wherein a program state of the respective worker thread currently executing in each time slot is maintained in a respective one of the context register sets; and an exit state register arranged to store an aggregated exit state the worker threads. The instruction set comprises an exit instruction for inclusion in each worker thread, the exit state instruction taking an individual exit state of the respective thread as an operand. The exit instruction terminates the respective worker and also cause the individual exit state specified in the operand to contribute to the aggregated exit state.
-
公开(公告)号:US20190155648A1
公开(公告)日:2019-05-23
申请号:US16166000
申请日:2018-10-19
Applicant: Graphcore Limited
Inventor: Simon Christian Knowles
Abstract: A processor comprising: an execution unit for executing a respective thread in each of a repeating sequence of time slots; and a plurality of context register sets, each comprising a respective set of registers for representing a state of a respective thread. The context register sets comprise a respective worker context register set for each of the number of time slots the execution unit is operable to interleave, and at least one extra context register set. The worker context register sets represent the respective states of worker threads and the extra context register set being represents the state of a supervisor thread. The processor is configured to begin running the supervisor thread in each of the time slots, and to enable the supervisor thread to then individually relinquish each of the time slots in which it is running to a respective one of the worker threads.
-
公开(公告)号:US20190121679A1
公开(公告)日:2019-04-25
申请号:US15885972
申请日:2018-02-01
Applicant: Graphcore Limited
Inventor: Daniel John Pelham Wilkinson , Simon Christian Knowles , Matthew David Fyles , Alan Graham Alexander , Stephen Felix
Abstract: A processing system comprising an arrangement of tiles and an interconnect between the tiles. The interconnect comprises synchronization logic for coordinating a barrier synchronization to be performed between a group of the tiles. The instruction set comprises a synchronization instruction taking an operand which selects one of a plurality of available modes each specifying a different membership of the group. Execution of the synchronization instruction cause a synchronization request to be transmitted from the respective tile to the synchronization logic, and instruction issue to be suspended on the respective tile pending a synchronization acknowledgement being received back from the synchronization logic. In response to receiving the synchronization request from all the tiles in the group as specified by the operand of the synchronization instruction, the synchronization logic returns the synchronization acknowledgment to the tiles in the specified group.
-
公开(公告)号:US20190121638A1
公开(公告)日:2019-04-25
申请号:US15885955
申请日:2018-02-01
Applicant: Graphcore Limited
Inventor: Simon Christian Knowles
Abstract: A processor comprising: an execution unit, multiple context register sets, a scheduler arranged to control the execution unit to provide a repeating sequence of temporally interleaved time slots, thereby enabling at least one respective worker thread to be allocated for execution in each respective one of some or all of the time slots, wherein a program state of the respective worker thread currently executing in each time slot is maintained in a respective one of the context register sets; and an exit state register arranged to store an aggregated exit state the worker threads. The instruction set comprises an exit instruction for inclusion in each worker thread, the exit state instruction taking an individual exit state of the respective thread as an operand. The exit instruction terminates the respective worker and also cause the individual exit state specified in the operand to contribute to the aggregated exit state.
-
-
-
-
-
-
-
-
-