SHARDING FOR SYNCHRONOUS PROCESSORS

    公开(公告)号:US20220300450A1

    公开(公告)日:2022-09-22

    申请号:US17636805

    申请日:2020-08-20

    申请人: Google LLC

    IPC分类号: G06F15/82 G06F17/16

    摘要: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for sharding dataflow graphs for a device having multiple synchronous tiles. One of the methods includes receiving a representation of a dataflow graph comprising a plurality of nodes that each represent respective matrix operations to be performed by a device having a plurality synchronous tiles. Candidate allocations of respective portions of the dataflow graph to each tile of the plurality of synchronous tiles are evaluated according to one or more resource constraints of the device. One of the candidate allocations is selected based on evaluating each candidate allocation.

    DUAL-MODE OPERATION OF APPLICATION SPECIFIC INTEGRATED CIRCUITS

    公开(公告)号:US20220286135A1

    公开(公告)日:2022-09-08

    申请号:US17634744

    申请日:2020-08-14

    申请人: Google LLC

    发明人: Reiner Pope

    摘要: A method for operating an integrated circuit chip including multiple tiles (202a-202d) includes determining a configuration for the tiles for execution of a computation. When the configuration for the tiles satisfies a first criterion, the integrated circuit is operated in a first mode, including concurrently receiving respective input data (208a, 208b) at each of the tiles (202a-202d). When the configuration for the tiles satisfies a second criterion, the integrated circuit is operated in a second mode, including: at a first time, concurrently receiving respective first input data (208a, 208b) at each tile (202a, 202b) of a first group of tiles; at the first time, storing respective second input data (208a, 208b) in each of multiple delay registers (212a, 212b), each delay register corresponding to a tile (202c, 202d) of a second group of tiles; at a second time, releasing the second input data from the delay registers (212a, 212b) and receiving the released respective second input data at each tile (202c, 202d) of the second group of tiles.

    SIGNED MULTIWORD MULTIPLIER
    3.
    发明申请

    公开(公告)号:US20220283777A1

    公开(公告)日:2022-09-08

    申请号:US17637531

    申请日:2020-08-20

    申请人: Google LLC

    发明人: Reiner Pope

    摘要: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for a hardware circuit configured as a signed multiword multiplier. The circuit includes a processing circuit that receives inputs that each have a respective bit-width. The processing circuit can represent at least one input as a signed multiword input based on the first input having a bit-width that exceeds a fixed bit-width of the hardware circuit. The circuit includes signed multipliers that are each configured to multiply signed inputs. Each signed multiplier includes multiplication circuitry configured to: receive the signed multiword input; receive a signed second input; and generate a signed output in response to multiplying the signed multiword input with the signed second input.

    INITIALIZING ON-CHIP OPERATIONS
    4.
    发明申请

    公开(公告)号:US20220277125A1

    公开(公告)日:2022-09-01

    申请号:US17636785

    申请日:2020-08-20

    申请人: Google LLC

    IPC分类号: G06F30/347 H04L49/253

    摘要: A method of configuring an integrated circuit including multiple hardware tiles, includes: establishing a data forwarding path through the multiple hardware tiles by configuring each hardware tile, except for a last hardware tile, of the multiple hardware tiles to be in a data forwarding state, in which configuring each hardware tile, except for the last hardware tile, to be in a forwarding state includes installing a respective forwarding state counter specifying a corresponding predefined length of time that the hardware tile is in the data forwarding state; supplying, along the data forwarding path, each hardware tile of the plurality of hardware tiles with a respective program data packet comprising program data for the hardware tile; and installing, for each hardware tile of the multiple hardware tiles, the respective program data.

    CONTROL OF MACHINE-LEARNING SYSTEMS

    公开(公告)号:US20220413721A1

    公开(公告)日:2022-12-29

    申请号:US17852059

    申请日:2022-06-28

    申请人: Google LLC

    IPC分类号: G06F3/06 G06F7/544

    摘要: A method includes: receiving control data at a first data selector of a plurality of data selectors, in which the control data comprises (i) a configuration registry address specifying a location in a configuration state registry and (ii) configuration data specifying a circuit configuration state of a circuit element of a computational circuit; transferring the control data, from the first data selector, to an entry in a trigger table registry; responsive to a first trigger event occurring, transferring the configuration data to the location in the configuration state registry specified by the configuration registry address; and updating a state of the circuit element based on the configuration data.

    COMPILATION FOR SYNCHRONOUS PROCESSOR

    公开(公告)号:US20220276847A1

    公开(公告)日:2022-09-01

    申请号:US17636579

    申请日:2020-08-21

    申请人: Google LLC

    IPC分类号: G06F8/41 G06F9/48

    摘要: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for compiling latency insensitive programs for a synchronous processor. One of the methods includes receiving an intermediate representation of a program specifying operations to be performed by a plurality of respective components of a synchronous processor, wherein the intermediate representation assigns, to each operation of the plurality of operations, a respective clock cycle value at which the operation is scheduled to be executed by the synchronous processor. The intermediate representation is processed to generate a respective update window for each operation in the intermediate representation requiring a hardware configuration update, wherein the update window specifies a time range during which a configuration update instruction can be executed to effectuate the hardware configuration update. Configuration update instructions are scheduled to occur during one or more update windows and according to the configuration constraints of the synchronous processor.

    Initializing on-chip operations
    7.
    发明授权

    公开(公告)号:US12124783B2

    公开(公告)日:2024-10-22

    申请号:US17636785

    申请日:2020-08-20

    申请人: Google LLC

    摘要: A method of configuring an integrated circuit including multiple hardware tiles, includes: establishing a data forwarding path through the multiple hardware tiles by configuring each hardware tile, except for a last hardware tile, of the multiple hardware tiles to be in a data forwarding state, in which configuring each hardware tile, except for the last hardware tile, to be in a forwarding state includes installing a respective forwarding state counter specifying a corresponding predefined length of time that the hardware tile is in the data forwarding state; supplying, along the data forwarding path, each hardware tile of the plurality of hardware tiles with a respective program data packet comprising program data for the hardware tile; and installing, for each hardware tile of the multiple hardware tiles, the respective program data.

    Dual-mode operation of application specific integrated circuits

    公开(公告)号:US11811401B2

    公开(公告)日:2023-11-07

    申请号:US17634744

    申请日:2020-08-14

    申请人: Google LLC

    发明人: Reiner Pope

    摘要: A method for operating an integrated circuit chip including multiple tiles (202a-202d) includes determining a configuration for the tiles for execution of a computation. When the configuration for the tiles satisfies a first criterion, the integrated circuit is operated in a first mode, including concurrently receiving respective input data (208a, 208b) at each of the tiles (202a-202d). When the configuration for the tiles satisfies a second criterion, the integrated circuit is operated in a second mode, including: at a first time, concurrently receiving respective first input data (208a, 208b) at each tile (202a, 202b) of a first group of tiles; at the first time, storing respective second input data (208a, 208b) in each of multiple delay registers (212a, 212b), each delay register corresponding to a tile (202c, 202d) of a second group of tiles; at a second time, releasing the second input data from the delay registers (212a, 212b) and receiving the released respective second input data at each tile (202c, 202d) of the second group of tiles.

    PROPAGATION LATENCY REDUCTION
    10.
    发明申请

    公开(公告)号:US20220318638A1

    公开(公告)日:2022-10-06

    申请号:US17636662

    申请日:2020-08-20

    申请人: Google LLC

    IPC分类号: G06N3/08 G06F17/16

    摘要: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for scheduling operations to reduce propagation latency between tiles of an accelerator. One of the methods includes receiving a request to generate a schedule for a first layer of a program to be executed by an accelerator configured to perform matrix operations at least partially in parallel, wherein the program defines a plurality of layers including the first layer, each layer of the program defining matrix operations to be performed using a respective matrix of values. A plurality of initial blocks of the schedule are assigned according to an initial assignment direction. The assignment direction is switched starting at a particular cycle so that blocks processed after the selected particular cycle are processed along a different second dimension of the first matrix. All remaining unassigned blocks are then assigned according to the switched assignment direction.