CODE COMPILATION FOR SCALING ACCELERATORS
    51.
    发明申请

    公开(公告)号:US20200012482A1

    公开(公告)日:2020-01-09

    申请号:US16235109

    申请日:2018-12-28

    Abstract: A computer system comprises a work accelerator, a gateway the transfer of data to the accelerator from external storage, the accelerator executes a first compiled code sequence to perform computations on data transferred to the accelerator from the gateway. The first compiled code sequence comprises a synchronisation instruction indicating a barrier between a compute phase in which the compute instructions are executed and an exchange phase, wherein execution of the synchronisation instruction causes an indication of a pre-compiled data exchange synchronisation point to be transferred to the gateway. The gateway comprises a streaming engine storing a second compiled code sequence in the form of a set of data transfer instructions executable by the streaming engine to perform data transfer operations to stream data through the gateway in the exchange phase, wherein the first and second compiled code sequences are generated as a related set at compile time.

    DIRECTION INDICATOR
    52.
    发明申请
    DIRECTION INDICATOR 审中-公开

    公开(公告)号:US20190310963A1

    公开(公告)日:2019-10-10

    申请号:US16451128

    申请日:2019-06-25

    Abstract: An indication of a direction of transmission over the switching fabric is inserted into a data packet that is transmitted from a tile. The indication of direction may indicate directions from the transmitting tile in which intended recipient tiles are present. The switching fabric prevents (e.g. by blocking the data packet at one of a series of latches) the transmission in a direction not indicated in the data packet. Hence, power saving may be achieved, by preventing the unnecessary transmission of data packets over parts of the switching fabric.

    SYNCHRONIZATION IN A MULTI-TILE PROCESSING ARRANGEMENT

    公开(公告)号:US20190121785A1

    公开(公告)日:2019-04-25

    申请号:US15886185

    申请日:2018-02-01

    Abstract: A processing system comprising an arrangement of tiles and synchronization logic in the form of hardware logic for coordinating between a group of some or all of said tiles. The instruction set comprises a synchronization instruction which causes an instance of a synchronization request to be transmitted from the respective tile to the synchronization logic, and suspends instruction issue on the respective tile pending a synchronization acknowledgement. In response to receiving an instance of the synchronization request from all of the tiles of the group, the synchronization logic returns the synchronization acknowledgment back to each of the tiles in the group to allow the instruction issue to resume. The instruction set further comprises an abstain instruction, which sends an instance of the synchronization request but does not suspend instruction issue on the respective tile pending the synchronization acknowledgement, instead allowing the instruction issue on the respective tile to continue.

    GENERATING RANDOMNESS IN NEURAL NETWORKS
    55.
    发明申请

    公开(公告)号:US20190121616A1

    公开(公告)日:2019-04-25

    申请号:US15886505

    申请日:2018-02-01

    Abstract: The present relates to invention deals with an execution unit configured to execute a computer program instruction to generate random numbers based on a predetermined probability distribution. The execution unit comprises a hardware pseudorandom number generator configured to generate at least randomised bit string on execution of the instruction and adding circuitry which is configured to receive a number of bit sequences of a predetermined bit length selected from the randomised bit string and to sum them to produce a result.

    Hardware autoloader
    56.
    发明授权

    公开(公告)号:US12164637B2

    公开(公告)日:2024-12-10

    申请号:US17338942

    申请日:2021-06-04

    Abstract: A new apparatus and method for securely distributing an application to processors of a processing unit. The processing unit is formed as part of an integrated circuit and comprises a plurality of processors (referred to as tiles), each having their own execution unit and storage for storing application data and additional executable instructions. The integrated circuit comprises a hardware module (referred to herein as the autoloader) that is configured to distribute a set of bootloader instructions (referred to herein as a secondary bootloader) to each of at least some of the tiles. Each of the tiles then executes instructions of the received secondary bootloader, which causes each tile to issue read requests to read a set of executable application instructions from a memory external to the integrated circuit. Each tile then performs operations using the received set of executable application instructions so as execute the application using the processing unit.

    EXECUTION UNIT, PROCESSING DEVICE AND METHOD OF GENERATING RANDOM SAMPLES

    公开(公告)号:US20240378261A1

    公开(公告)日:2024-11-14

    申请号:US18658303

    申请日:2024-05-08

    Abstract: An execution unit, the execution unit having access to a local memory storing a lookup table with a plurality of entries, each entry comprising an x value and corresponding y value representative of a point on a curve of a cumulative distribution function, CDF, consecutive entries of the plurality of entries forming an interval of the CDF, the execution unit being configured to: receive one or more computer program instructions, and in response: generate a random number using random number generation hardware associated with the execution unit, determine, based on the lookup table, the interval of the CDF in which the generated random number falls, and interpolate between y values of entries forming the interval based on the generated random number to generate a random sample of the CDF.

    EXECUTION UNIT, PROCESSING DEVICE AND METHOD FOR APPROXIMATING A FUNCTION

    公开(公告)号:US20240378260A1

    公开(公告)日:2024-11-14

    申请号:US18657191

    申请日:2024-05-07

    Abstract: An execution unit configured to: receive a first computer program instruction to populate a lookup table with a plurality of entries, each entry comprising an x value and corresponding y value representative of a point on a curve of a function, consecutive entries of the plurality of entries forming an interval of the function, populate a lookup table stored in a local memory associated with the execution unit with the plurality of entries, receive a second computer program instruction, the second computer program instruction indicating an input value, determine, based on the lookup table, the interval of the function in which the input value falls, and interpolate between y values of entries forming the interval to generate an output value corresponding to the input value.

    Rotating Data Blocks
    60.
    发明公开

    公开(公告)号:US20240201988A1

    公开(公告)日:2024-06-20

    申请号:US18543036

    申请日:2023-12-18

    Inventor: Mark SHEPPARD

    CPC classification number: G06F9/30032 G06F1/08 G06F9/30038

    Abstract: An execution unit performs a byte-wise rotation of an input data block. An input data array receives an input data block. Two first layer multiplexer arrays each receive a first layer data block comprising a respective subset of bytes of the input data block and a first layer control signal, and rotate the first layer data block by an amount indicated by the first layer control signal. The second layer multiplexer array receives a second control signal and selects between a corresponding byte of the first and second rotated first layer data blocks based on the second control signal. The execution unit also includes a control signal generator, configured to generate the first layer control signal and second layer control signal based on a received computer program instruction. Results of smaller block rotations are thus used as partial results for larger block rotation, avoiding large multiplexer arrays with complex wiring.

Patent Agency Ranking