Performing consecutive mac operations on a set of data using different kernels in a MAC circuit

    公开(公告)号:US11429850B2

    公开(公告)日:2022-08-30

    申请号:US16040357

    申请日:2018-07-19

    Applicant: Xilinx, Inc.

    Abstract: A circuit arrangement includes an array of MAC circuits, wherein each MAC circuit includes a cache configured for storage of a plurality of kernels. The MAC circuits are configured to receive a first set of data elements of an IFM at a first rate. The MAC circuits are configured to perform first MAC operations on the first set of the data elements and a first one of the kernels associated with a first OFM depth index during a first MAC cycle, wherein a rate of MAC cycles is faster than the first rate. The MAC circuits are configured to perform second MAC operations on the first set of the data elements and a second one of the kernels associated with a second OFM depth index during a second MAC cycle that consecutively follows the first MAC cycle.

    Memory arrangement for tensor data

    公开(公告)号:US10346093B1

    公开(公告)日:2019-07-09

    申请号:US15923950

    申请日:2018-03-16

    Applicant: Xilinx, Inc.

    Abstract: Disclosed circuitry includes RAM circuits, a memory controller, and an array of processing circuits. Each RAM circuit includes a read port and a write port. The memory controller accesses tensor data arranged in banks of tensor buffers in the RAM circuits. The memory controller is coupled to each read port by shared read control signal lines and to each write port by shared write control signal lines. The memory controller generates read control and write control signals for accessing different ones of the tensor buffers at different times. The array of processing circuits is coupled to one of the RAM circuits. The array includes multiple rows and multiple of columns of processing circuits for performing tensor operations on the tensor data. The processing circuits in each row in each array of processing circuits are coupled to input the same tensor data.

    DATA TRANSFERS BETWEEN A MEMORY AND A DISTRIBUTED COMPUTE ARRAY

    公开(公告)号:US20210174848A1

    公开(公告)日:2021-06-10

    申请号:US16706437

    申请日:2019-12-06

    Applicant: Xilinx, Inc.

    Abstract: An integrated circuit (IC) includes a plurality of dies. The IC includes a plurality of memory channel interfaces configured to communicate with a memory, wherein the plurality of memory channel interfaces are disposed within a first die of the plurality of dies. The IC may include a compute array distributed across the plurality of dies and a plurality of remote buffers distributed across the plurality of dies. The plurality of remote buffers are coupled to the plurality of memory channels and to the compute array. The IC further includes a controller configured to determine that each of the plurality of remote buffers has data stored therein and, in response, broadcast a read enable signal to each of the plurality of remote buffers initiating data transfers from the plurality of remote buffers to the compute array across the plurality of dies.

    PERFORMING CONSECUTIVE MAC OPERATIONS ON A SET OF DATA USING DIFFERENT KERNELS IN A MAC CIRCUIT

    公开(公告)号:US20200026989A1

    公开(公告)日:2020-01-23

    申请号:US16040357

    申请日:2018-07-19

    Applicant: Xilinx, Inc.

    Abstract: A circuit arrangement includes an array of MAC circuits, wherein each MAC circuit includes a cache configured for storage of a plurality of kernels. The MAC circuits are configured to receive a first set of data elements of an IFM at a first rate. The MAC circuits are configured to perform first MAC operations on the first set of the data elements and a first one of the kernels associated with a first OFM depth index during a first MAC cycle, wherein a rate of MAC cycles is faster than the first rate. The MAC circuits are configured to perform second MAC operations on the first set of the data elements and a second one of the kernels associated with a second OFM depth index during a second MAC cycle that consecutively follows the first MAC cycle.

    Data transfers between a memory and a distributed compute array

    公开(公告)号:US11127442B2

    公开(公告)日:2021-09-21

    申请号:US16706437

    申请日:2019-12-06

    Applicant: Xilinx, Inc.

    Abstract: An integrated circuit (IC) includes a plurality of dies. The IC includes a plurality of memory channel interfaces configured to communicate with a memory, wherein the plurality of memory channel interfaces are disposed within a first die of the plurality of dies. The IC may include a compute array distributed across the plurality of dies and a plurality of remote buffers distributed across the plurality of dies. The plurality of remote buffers are coupled to the plurality of memory channels and to the compute array. The IC further includes a controller configured to determine that each of the plurality of remote buffers has data stored therein and, in response, broadcast a read enable signal to each of the plurality of remote buffers initiating data transfers from the plurality of remote buffers to the compute array across the plurality of dies.

    Neural-network pooling
    6.
    发明授权

    公开(公告)号:US11531869B1

    公开(公告)日:2022-12-20

    申请号:US16368397

    申请日:2019-03-28

    Applicant: Xilinx, Inc.

    Abstract: Embodiments herein describe circuitry with improved efficiency when executing layers in a nested neural network. As mentioned above, a nested neural network has at least one split operation where a tensor generated by a first layer is transmitted to, and processed by several branches in the neural network. Each of these branches can have several layers that have data dependencies which result in a multiply-add array sitting idly. In one embodiment, the circuitry can include a dedicated pre-pooler for performing a pre-pooling operation. Thus, the pre-pooling operation can be performing in parallel with other operations (e.g., the convolution performed by another layer). Once the multiply-add array is idle, the pre-pooling operation has already completed (or at least, has already started) which means the time the multiply-add array must wait before it can perform the next operation is reduced or eliminated.

    Neural network controller
    7.
    发明授权

    公开(公告)号:US11429851B1

    公开(公告)日:2022-08-30

    申请号:US16219303

    申请日:2018-12-13

    Applicant: Xilinx, Inc.

    Abstract: Disclosed circuits and methods involve a first register configured to store of a first convolutional neural network (CNN) instruction during processing of the first CNN instruction and a second register configured to store a second CNN instruction during processing of the second CNN instruction. Each of a plurality of address generation circuits is configured to generate one or more addresses in response to an input CNN instruction. Control circuitry is configured to select one of the first CNN instruction or the second CNN instruction as input to the address generation circuits.

Patent Agency Ranking