SYSTOLIC ARRAY ACCELERATOR SYSTEMS AND METHODS

    公开(公告)号:US20200272596A1

    公开(公告)日:2020-08-27

    申请号:US16283795

    申请日:2019-02-24

    Abstract: The present disclosure is directed to systems and methods for decomposing systolic array circuitry to provide a plurality of N×N systolic sub-array circuits, apportioning a first tensor or array into a plurality of N×M first input arrays, and apportioning a second tensor or array into a plurality of M×N second input arrays. Systolic array control circuitry transfers corresponding ones of the first input arrays and second input arrays to a respective one of the plurality of N×N systolic sub-array circuits. As the elements included in the first input array and the elements included in the second input array are transferred to the systolic sub-array, the systolic sub-array performs one or more mathematical operations using the first and the second input arrays. The systems and methods beneficially improve the usage of the systolic array circuitry thereby advantageously reducing the number of clock cycles needed to perform a given number of calculations.

    Structures and operations of integrated circuits having network of configurable switches

    公开(公告)号:US10476492B2

    公开(公告)日:2019-11-12

    申请号:US16201915

    申请日:2018-11-27

    Abstract: Embodiments herein may present an integrated circuit including a switch, where the switch together with other switches forms a network of switches to perform a sequence of operations according to a structure of a collective tree. The switch includes a first number of input ports, a second number of output ports, a configurable crossbar to selectively couple the first number of input ports to the second number of output ports, and a computation engine coupled to the first number of input ports, the second number of output ports, and the crossbar. The computation engine of the switch performs an operation corresponding to an operation represented by a node of the collective tree. The switch further includes one or more registers to selectively configure the first number of input ports and the configurable crossbar. Other embodiments may be described and/or claimed.

    LARGE-SCALE MATRIX RESTRUCTURING AND MATRIX-SCALAR OPERATIONS

    公开(公告)号:US20220100508A1

    公开(公告)日:2022-03-31

    申请号:US17134251

    申请日:2020-12-25

    Abstract: Embodiments of apparatuses and methods for copying and operating on matrix elements are described. In embodiments, an apparatus includes a hardware instruction decoder to decode a single instruction and execution circuitry, coupled to hardware instruction decoder, to perform one or more operations corresponding to the single instruction. The single instruction has a first operand to reference a base address of a first representation of a source matrix and a second operand to reference a base address of second representation of a destination matrix. The one or more operations include copying elements of the source matrix to corresponding locations in the destination matrix and filling empty elements of the destination matrix with a single value.

    ARRAY BROADCAST AND REDUCTION SYSTEMS AND METHODS

    公开(公告)号:US20200310795A1

    公开(公告)日:2020-10-01

    申请号:US16369846

    申请日:2019-03-29

    Abstract: The present disclosure is directed to systems and methods of performing one or more broadcast or reduction operations using direct memory access (DMA) control circuitry. The DMA control circuitry executes a modified instruction set architecture (ISA) that facilitates the broadcast distribution of data to a plurality of destination addresses in system memory circuitry. The broadcast instruction may include broadcast of a single data value to each destination address. The broadcast instruction may include broadcast of a data array to each destination address. The DMA control circuitry may also execute a reduction instruction that facilitates the retrieval of data from a plurality of source addresses in system memory and performing one or more operations using the retrieved data. Since the DMA control circuitry, rather than the processor circuitry performs the broadcast and reduction operations, system speed and efficiency is beneficially enhanced.

    STRUCTURES AND OPERATIONS OF INTEGRATED CIRCUITS HAVING NETWORK OF CONFIGURABLE SWITCHES

    公开(公告)号:US20190109590A1

    公开(公告)日:2019-04-11

    申请号:US16201915

    申请日:2018-11-27

    CPC classification number: H03K17/005 G11C7/1006 H03K17/007 H03K19/1733

    Abstract: Embodiments herein may present an integrated circuit including a switch, where the switch together with other switches forms a network of switches to perform a sequence of operations according to a structure of a collective tree. The switch includes a first number of input ports, a second number of output ports, a configurable crossbar to selectively couple the first number of input ports to the second number of output ports, and a computation engine coupled to the first number of input ports, the second number of output ports, and the crossbar. The computation engine of the switch performs an operation corresponding to an operation represented by a node of the collective tree. The switch further includes one or more registers to selectively configure the first number of input ports and the configurable crossbar. Other embodiments may be described and/or claimed.

    Systolic array accelerator systems and methods

    公开(公告)号:US11003619B2

    公开(公告)日:2021-05-11

    申请号:US16283795

    申请日:2019-02-24

    Abstract: The present disclosure is directed to systems and methods for decomposing systolic array circuitry to provide a plurality of N×N systolic sub-array circuits, apportioning a first tensor or array into a plurality of N×M first input arrays, and apportioning a second tensor or array into a plurality of M×N second input arrays. Systolic array control circuitry transfers corresponding ones of the first input arrays and second input arrays to a respective one of the plurality of N×N systolic sub-array circuits. As the elements included in the first input array and the elements included in the second input array are transferred to the systolic sub-array, the systolic sub-array performs one or more mathematical operations using the first and the second input arrays. The systems and methods beneficially improve the usage of the systolic array circuitry thereby advantageously reducing the number of clock cycles needed to perform a given number of calculations.

Patent Agency Ranking