Timing closure of circuit designs for integrated circuits

    公开(公告)号:US10366201B1

    公开(公告)日:2019-07-30

    申请号:US15495760

    申请日:2017-04-24

    Applicant: Xilinx, Inc.

    Abstract: Closing timing for a circuit design can include displaying, using a display device, a first region having a plurality of controls corresponding to a plurality of data sets generated at different times during a phase of a design flow for a circuit design, wherein each control selects a data set associated with the control, and displaying, using the display device, a second region configured to display a list of critical paths for data sets selected from the first region using one of the plurality of controls. Closing timing further can include displaying, using the display device, a third region configured to display a representation of a target integrated circuit including layouts for the critical paths of the list for implementations of the circuit design specified by the selected data sets.

    IMAGE PREPROCESSING FOR GENERALIZED IMAGE PROCESSING

    公开(公告)号:US20190114499A1

    公开(公告)日:2019-04-18

    申请号:US15786267

    申请日:2017-10-17

    Applicant: Xilinx, Inc.

    Abstract: An example preprocessor circuit for formatting image data into a plurality of streams of image samples includes: a first buffer configured to store a plurality of rows of the image data and output a row of the plurality of rows; a second buffer, coupled to the first buffer, including a plurality of storage locations to store a respective plurality of image samples of the row output by the first buffer; a plurality of shift registers; an interconnect network including a plurality of connections, each connection coupling a respective one of the plurality of shift registers to more than one of the plurality of storage locations, one or more of the plurality of storage locations being coupled to more than one of the plurality of connections; and a control circuit configured to load the plurality of shift registers with the plurality of image samples based on the plurality of connections and shift the plurality of shift registers to output the plurality of streams of image samples.

    Instruction set architecture for data processing array control

    公开(公告)号:US12248786B2

    公开(公告)日:2025-03-11

    申请号:US17818309

    申请日:2022-08-08

    Applicant: Xilinx, Inc.

    Abstract: Controlling a data processing (DP) array includes creating a replica of a register address space of the DP array based on the design and the DP array. A sequence of instructions, including write instructions and read instructions, is received. The write instructions correspond to buffer descriptors specifying runtime data movements for a design for a DP array. The write instructions are converted into transaction instructions and the read instructions are converted into wait instructions based on the replica of the register address space. The transaction instructions and the wait instructions are included in an instruction buffer. The instruction buffer is provided to a microcontroller configured to execute the transaction instructions and the wait instructions to implement the runtime data movements for the design as implemented in the DP array. In another aspect, the instruction buffer is stored in a file for subsequent execution by the microcontroller.

    Host-directed multi-layer neural network processing via per-layer work requests

    公开(公告)号:US11429848B2

    公开(公告)日:2022-08-30

    申请号:US15786102

    申请日:2017-10-17

    Applicant: Xilinx, Inc.

    Abstract: In disclosed approaches of neural network processing, a host computer system copies an input data matrix from host memory to a shared memory for performing neural network operations of a first layer of a neural network by a neural network accelerator. The host instructs the neural network accelerator to perform neural network operations of each layer of the neural network beginning with the input data matrix. The neural network accelerator performs neural network operations of each layer in response to the instruction from the host. The host waits until the neural network accelerator signals completion of performing neural network operations of layer i before instructing the neural network accelerator to commence performing neural network operations of layer i+1, for i≥1. The host instructs the neural network accelerator to use a results data matrix in the shared memory from layer i as an input data matrix for layer i+1 for i≥1.

    Sparse matrix processing circuitry
    29.
    发明授权

    公开(公告)号:US10936311B1

    公开(公告)日:2021-03-02

    申请号:US16505987

    申请日:2019-07-09

    Applicant: Xilinx, Inc.

    Abstract: Disclosed approaches for multiplying a sparse matrix by dense a vector or matrix include first memory banks for storage of column indices, second memory banks for storage of row indices, and third memory banks for storage of non-zero values of a sparse matrix. A pairing circuit distributes an input stream of vector elements across first first-in-first-out (FIFO) buffers according to the buffered column indices. Multiplication circuitry multiplies vector elements output from the first FIFO buffers by corresponding ones of the non-zero values from the third memory banks, and stores products in second FIFO buffers. Row-aligner circuitry organize the products output from the second FIFO buffers into third FIFO buffers according to row indices in the second memory banks. Accumulation circuitry accumulates respective totals from products output from the third FIFO buffers.

    Data format suitable for fast massively parallel general matrix multiplication in a programmable IC

    公开(公告)号:US10515135B1

    公开(公告)日:2019-12-24

    申请号:US15785688

    申请日:2017-10-17

    Applicant: Xilinx, Inc.

    Abstract: Methods and apparatus are described for performing data-intensive compute algorithms, such as fast massively parallel general matrix multiplication (GEMM), using a particular data format for both storing data to and reading data from memory. This data format may be utilized for arbitrarily-sized input matrices for GEMM implemented on a finite-size GEMM accelerator in the form of a rectangular compute array of digital signal processing (DSP) elements or similar compute cores. This data format solves the issue of double data rate (DDR) dynamic random access memory (DRAM) bandwidth by allowing both linear DDR addressing and single cycle loading of data into the compute array, avoiding input/output (I/O) and/or DDR bottlenecks.

Patent Agency Ranking