Scheduling tasks in a multi-threaded processor

    公开(公告)号:US11550591B2

    公开(公告)日:2023-01-10

    申请号:US17172864

    申请日:2021-02-10

    Abstract: A processor comprising: an execution unit for executing a respective thread in each of a repeating sequence of time slots; and a plurality of context register sets, each comprising a respective set of registers for representing a state of a respective thread. The context register sets comprise a respective worker context register set for each of the number of time slots the execution unit is operable to interleave, and at least one extra context register set. The worker context register sets represent the respective states of worker threads and the extra context register set being represents the state of a supervisor thread. The processor is configured to begin running the supervisor thread in each of the time slots, and to enable the supervisor thread to then individually relinquish each of the time slots in which it is running to a respective one of the worker threads.

    Combining states of multiple threads in a multi threaded processor

    公开(公告)号:US11113060B2

    公开(公告)日:2021-09-07

    申请号:US16529122

    申请日:2019-08-01

    Abstract: A processing apparatus comprising one or more processing modules, each comprising an execution unit. The one or more processing modules are operable to run a plurality of parallel or concurrent threads, and the processing apparatus further comprises a storage location for storing an aggregated exit state of the plurality of threads. An instruction set of the processing apparatus comprises an exit instruction for inclusion in each of the plurality of threads, the exit state instruction taking an individual exit state of the respective thread as an operand. The exit instruction terminates the respective thread and also causes the individual exit state specified in the operand to contribute to the aggregated exit state.

    PARALLEL COMPUTING
    3.
    发明申请
    PARALLEL COMPUTING 审中-公开

    公开(公告)号:US20190121678A1

    公开(公告)日:2019-04-25

    申请号:US15885949

    申请日:2018-02-01

    Abstract: A method for executing a computer program, the method implemented by a processor comprising a plural number of computing units and an interconnect connected to the computing units, wherein each computing unit comprises a processing unit and a memory having at least two memory ports, each port assignable to one or more respective regions of the memory, wherein the method comprises at each computing unit: performing an initial step of the program to write: an initial output value to an output region of the memory, and an initial input value to an input region of the memory; and performing a subsequent step of the program by: in a compute phase: assigning one of the two ports to both the input region and the output region; executing code sequences on the processing unit to compute an output set of one or more new output values, and writing the output set to the output region, the output set computed from the initial output and initial input values, each of which is retrieved via said one port in the compute phase; when the compute phase has completed, in an exchange phase: assigning a first of the two ports to the output region and a second of the two ports to input region; and retrieving a new output value of the output set from the output region via said first port and sending the retrieved value to a different computing unit via the interconnect, and receiving via the interconnect a new input value which has been computed by the a different computing unit in the subsequent step and writing the received value to the input region via said second port.

    PROCESSING IN NEURAL NETWORKS
    4.
    发明申请

    公开(公告)号:US20190121639A1

    公开(公告)日:2019-04-25

    申请号:US15886331

    申请日:2018-02-01

    Abstract: The present invention relates to an execution unit for executing a computer program comprising a sequence of instructions, which include a masking instruction. The execution unit is configured to execute the masking instruction which when executed by the execution unit masks randomly selected values from a source operand of n values and retains other original values from the source operand to generate a result which includes original values from the source operand and the masked values in their respective original locations.

    INSTRUCTION CACHE IN A MULTI-THREADED PROCESSOR

    公开(公告)号:US20200210192A1

    公开(公告)日:2020-07-02

    申请号:US16276895

    申请日:2019-02-15

    Abstract: A processor comprising: a barrel-threaded execution unit for executing concurrent threads, and a repeat cache shared between the concurrent threads. The processor's instruction set includes a repeat instruction which takes a repeat count operand. When the repeat cache is not claimed and the repeat instruction is executed in a first thread, a portion of code is cached from the first thread into the repeat cache, the state of the repeat cache is changed to record it as claimed, and the cached code is executed a number of times. When the repeat instruction is then executed in a further thread, then the already-cached portion of code is again executed a respective number of times, each time from the repeat cache. For each of the first and further instructions, the repeat count operand in the respective instruction specifies the number of times to execute the cached code.

    COMPILER METHOD
    6.
    发明申请
    COMPILER METHOD 审中-公开

    公开(公告)号:US20200150713A1

    公开(公告)日:2020-05-14

    申请号:US16744249

    申请日:2020-01-16

    Abstract: The invention relates to a computer implemented method of generating multiple programs to deliver a computerised function, each program to be executed in a processing unit of a computer comprising a plurality of processing units each having instruction storage for holding a local program, an execution unit for executing the local program and data storage for holding data, a switching fabric connected to an output interface of each processing unit and connectable to an input interface of each processing unit by switching circuitry controllable by each processing unit, and a synchronisation module operable to generate a synchronisation signal, the method comprising: generating a local program for each processing unit comprising a sequence of executable instructions; determining for each processing unit a relative time of execution of instructions of each local program whereby a local program allocated to one processing unit is scheduled to execute with a predetermined delay relative to a synchronisation signal a send instruction to transmit at least one data packet at a predetermined transmit time, relative to the synchronisation signal, destined for a recipient processing unit but having no destination identifier, and a local program allocated to the recipient processing unit is scheduled to execute at a predetermined switch time a switch control instruction to control the switching circuitry to connect its processing unit wire to the switching fabric to receive the data packet at a receive time.

    SYNCHRONIZATION IN A MULTI-TILE PROCESSING ARRAY

    公开(公告)号:US20190155328A1

    公开(公告)日:2019-05-23

    申请号:US16165978

    申请日:2018-10-19

    Abstract: The invention relates to a computer comprising: a plurality of processing units each having instruction storage holding a local program, an execution unit executing the local program, data storage for holding data; an input interface with a set of input wires, and an output interface with a set of output wires; a switching fabric connected to each of the processing units by the respective set of output wires and connectable to each of the processing units by the respective input wires via switching circuitry controllable by each processing unit; a synchronisation module operable to generate a synchronisation signal to control the computer to switch between a compute phase and an exchange phase, wherein the processing units are configured to execute their local programs according to a common clock, the local programs being such that in the exchange phase at least one processing unit executes a send instruction from its local program to transmit at a transmit time a data packet onto its output set of connection wires, the data packet being destined for at least one recipient processing unit but having no destination identifier, and at a predetermined switch time the recipient processing unit executes a switch control instruction from its local program to control its switching circuitry to connect its input set of wires to the switching fabric to receive the data packet at a receive time, the transmit time and, switch time and receive time being governed by the common clock with respect to the synchronisation signal.

    INSTRUCTION SET
    8.
    发明申请
    INSTRUCTION SET 审中-公开

    公开(公告)号:US20190121777A1

    公开(公告)日:2019-04-25

    申请号:US15886131

    申请日:2018-02-01

    Abstract: The invention relates to a computer program comprising a sequence of instructions for execution on a processing unit having instruction storage for holding the computer program, an execution unit for executing the computer program and data storage for holding data, the computer program comprising one or more computer executable instruction which, when executed, implements: a send function which causes a data packet destined for a recipient processing unit to be transmitted on a set of connection wires connected to the processing unit, the data packet having no destination identifier but being transmitted at a predetermined transmit time; and a switch control function which causes the processing unit to control switching circuitry to connect a set of connection wires of the processing unit to a switching fabric to receive a data packet at a predetermined receive time.

    SCHEDULING TASKS IN A MULTI-THREADED PROCESSOR

    公开(公告)号:US20190121668A1

    公开(公告)日:2019-04-25

    申请号:US15885925

    申请日:2018-02-01

    Abstract: A processor comprising: an execution unit for executing a respective thread in each of a repeating sequence of time slots; and a plurality of context register sets, each comprising a respective set of registers for representing a state of a respective thread. The context register sets comprise a respective worker context register set for each of the number of time slots the execution unit is operable to interleave, and at least one extra context register set. The worker context register sets represent the respective states of worker threads and the extra context register set being represents the state of a supervisor thread. The processor is configured to begin running the supervisor thread in each of the time slots, and to enable the supervisor thread to then individually relinquish each of the time slots in which it is running to a respective one of the worker threads.

    SYNCHRONIZATION IN A MULTI-TILE PROCESSING ARRAY

    公开(公告)号:US20190121387A1

    公开(公告)日:2019-04-25

    申请号:US15886009

    申请日:2018-02-01

    Abstract: The invention relates to a computer comprising: a plurality of processing units each having instruction storage holding a local program, an execution unit executing the local program, data storage for holding data; an input interface with a set of input wires, and an output interface with a set of output wires; a switching fabric connected to each of the processing units by the respective set of output wires and connectable to each of the processing units by the respective input wires via switching circuitry controllable by each processing unit; a synchronisation module operable to generate a synchronisation signal to control the computer to switch between a compute phase and an exchange phase, wherein the processing units are configured to execute their local programs according to a common clock, the local programs being such that in the exchange phase at least one processing unit executes a send instruction from its local program to transmit at a transmit time a data packet onto its output set of connection wires, the data packet being destined for at least one recipient processing unit but having no destination identifier, and at a predetermined switch time the recipient processing unit executes a switch control instruction from its local program to control its switching circuitry to connect its input set of wires to the switching fabric to receive the data packet at a receive time, the transmit time and, switch time and receive time being governed by the common clock with respect to the synchronisation signal.

Patent Agency Ranking