-
公开(公告)号:US20200012482A1
公开(公告)日:2020-01-09
申请号:US16235109
申请日:2018-12-28
Applicant: Graphcore Limited
Inventor: Ola Tørudbakken , Daniel John Pelham Wilkinson , Brian Manula , Harald Høeg
Abstract: A computer system comprises a work accelerator, a gateway the transfer of data to the accelerator from external storage, the accelerator executes a first compiled code sequence to perform computations on data transferred to the accelerator from the gateway. The first compiled code sequence comprises a synchronisation instruction indicating a barrier between a compute phase in which the compute instructions are executed and an exchange phase, wherein execution of the synchronisation instruction causes an indication of a pre-compiled data exchange synchronisation point to be transferred to the gateway. The gateway comprises a streaming engine storing a second compiled code sequence in the form of a set of data transfer instructions executable by the streaming engine to perform data transfer operations to stream data through the gateway in the exchange phase, wherein the first and second compiled code sequences are generated as a related set at compile time.
-
公开(公告)号:US20190310963A1
公开(公告)日:2019-10-10
申请号:US16451128
申请日:2019-06-25
Applicant: Graphcore Limited
Inventor: Stephen Felix , Jonathan Mangnall
IPC: G06F15/173 , G06F1/32 , G06F15/80
Abstract: An indication of a direction of transmission over the switching fabric is inserted into a data packet that is transmitted from a tile. The indication of direction may indicate directions from the transmitting tile in which intended recipient tiles are present. The switching fabric prevents (e.g. by blocking the data packet at one of a series of latches) the transmission in a direction not indicated in the data packet. Hence, power saving may be achieved, by preventing the unnecessary transmission of data packets over parts of the switching fabric.
-
公开(公告)号:US20190121785A1
公开(公告)日:2019-04-25
申请号:US15886185
申请日:2018-02-01
Applicant: Graphcore Limited
Inventor: Daniel John Pelham Wilkinson , Richard Luke Southwell Osborne , Matthew David Fyles , Alan Graham Alexander , Stephen Felix
IPC: G06F15/80 , G06F9/52 , G06F15/173
Abstract: A processing system comprising an arrangement of tiles and synchronization logic in the form of hardware logic for coordinating between a group of some or all of said tiles. The instruction set comprises a synchronization instruction which causes an instance of a synchronization request to be transmitted from the respective tile to the synchronization logic, and suspends instruction issue on the respective tile pending a synchronization acknowledgement. In response to receiving an instance of the synchronization request from all of the tiles of the group, the synchronization logic returns the synchronization acknowledgment back to each of the tiles in the group to allow the instruction issue to resume. The instruction set further comprises an abstain instruction, which sends an instance of the synchronization request but does not suspend instruction issue on the respective tile pending the synchronization acknowledgement, instead allowing the instruction issue on the respective tile to continue.
-
公开(公告)号:US20190121784A1
公开(公告)日:2019-04-25
申请号:US15886138
申请日:2018-02-01
Applicant: Graphcore Limited
Inventor: Daniel John Pelham Wilkinson , Stephen Felix , Richard Luke Southwell Osborne , Simon Christian Knowles , Alan Graham Alexander , Ian James Quinn
IPC: G06F15/80 , G06F9/52 , G06F15/173
Abstract: A method of operating a system comprising multiple processor tiles divided into a plurality of domains wherein within each domain the tiles are connected to one another via a respective instance of a time-deterministic interconnect and between domains the tiles are connected to one another via a non-time-deterministic interconnect. The method comprises: performing a compute stage, then performing a respective internal barrier synchronization within each domain, then performing an internal exchange phase within each domain, then performing an external barrier synchronization to synchronize between different domains, then performing an external exchange phase between the domains.
-
公开(公告)号:US20190121616A1
公开(公告)日:2019-04-25
申请号:US15886505
申请日:2018-02-01
Applicant: Graphcore Limited
Inventor: Stephen Felix , Godfrey Da Costa
Abstract: The present relates to invention deals with an execution unit configured to execute a computer program instruction to generate random numbers based on a predetermined probability distribution. The execution unit comprises a hardware pseudorandom number generator configured to generate at least randomised bit string on execution of the instruction and adding circuitry which is configured to receive a number of bit sequences of a predetermined bit length selected from the randomised bit string and to sum them to produce a result.
-
公开(公告)号:US12164637B2
公开(公告)日:2024-12-10
申请号:US17338942
申请日:2021-06-04
Applicant: Graphcore Limited
Inventor: Daniel John Pelham Wilkinson
Abstract: A new apparatus and method for securely distributing an application to processors of a processing unit. The processing unit is formed as part of an integrated circuit and comprises a plurality of processors (referred to as tiles), each having their own execution unit and storage for storing application data and additional executable instructions. The integrated circuit comprises a hardware module (referred to herein as the autoloader) that is configured to distribute a set of bootloader instructions (referred to herein as a secondary bootloader) to each of at least some of the tiles. Each of the tiles then executes instructions of the received secondary bootloader, which causes each tile to issue read requests to read a set of executable application instructions from a memory external to the integrated circuit. Each tile then performs operations using the received set of executable application instructions so as execute the application using the processing unit.
-
公开(公告)号:US20240378261A1
公开(公告)日:2024-11-14
申请号:US18658303
申请日:2024-05-08
Applicant: Graphcore Limited
Inventor: Badreddine NOUNE , Godfrey DA COSTA , Carlo LUSCHI
Abstract: An execution unit, the execution unit having access to a local memory storing a lookup table with a plurality of entries, each entry comprising an x value and corresponding y value representative of a point on a curve of a cumulative distribution function, CDF, consecutive entries of the plurality of entries forming an interval of the CDF, the execution unit being configured to: receive one or more computer program instructions, and in response: generate a random number using random number generation hardware associated with the execution unit, determine, based on the lookup table, the interval of the CDF in which the generated random number falls, and interpolate between y values of entries forming the interval based on the generated random number to generate a random sample of the CDF.
-
公开(公告)号:US20240378260A1
公开(公告)日:2024-11-14
申请号:US18657191
申请日:2024-05-07
Applicant: Graphcore Limited
Inventor: Badreddine NOUNE , Godfrey DA COSTA , Carlo LUSCHI
Abstract: An execution unit configured to: receive a first computer program instruction to populate a lookup table with a plurality of entries, each entry comprising an x value and corresponding y value representative of a point on a curve of a function, consecutive entries of the plurality of entries forming an interval of the function, populate a lookup table stored in a local memory associated with the execution unit with the plurality of entries, receive a second computer program instruction, the second computer program instruction indicating an input value, determine, based on the lookup table, the interval of the function in which the input value falls, and interpolate between y values of entries forming the interval to generate an output value corresponding to the input value.
-
公开(公告)号:US12073262B2
公开(公告)日:2024-08-27
申请号:US17338898
申请日:2021-06-04
Applicant: Graphcore Limited
Inventor: Ola Torudbakken , Wei-Lin Guay
IPC: G06F9/52 , G06F9/38 , G06F9/54 , G06F15/173
CPC classification number: G06F9/522 , G06F9/3851 , G06F9/543 , G06F9/544 , G06F15/173 , G06F15/17325
Abstract: A host system compiles a set of local programs which are provided over a network to a plurality of subsystems. By defining the synchronisation activity on the host, and then providing that information to the subsystems, the host can service a large number of subsystems. The defined synchronisation activity includes defining the synchronisation groups between which synchronisation barriers occur and the points during program execution at which data exchange with the host occurs. Defining synchronisation activity between the subsystems allows a large number of subsystems to be connecting whilst minimising the required exchanges with the host.
-
公开(公告)号:US20240201988A1
公开(公告)日:2024-06-20
申请号:US18543036
申请日:2023-12-18
Applicant: Graphcore Limited
Inventor: Mark SHEPPARD
CPC classification number: G06F9/30032 , G06F1/08 , G06F9/30038
Abstract: An execution unit performs a byte-wise rotation of an input data block. An input data array receives an input data block. Two first layer multiplexer arrays each receive a first layer data block comprising a respective subset of bytes of the input data block and a first layer control signal, and rotate the first layer data block by an amount indicated by the first layer control signal. The second layer multiplexer array receives a second control signal and selects between a corresponding byte of the first and second rotated first layer data blocks based on the second control signal. The execution unit also includes a control signal generator, configured to generate the first layer control signal and second layer control signal based on a received computer program instruction. Results of smaller block rotations are thus used as partial results for larger block rotation, avoiding large multiplexer arrays with complex wiring.
-
-
-
-
-
-
-
-
-