-
公开(公告)号:US10430539B1
公开(公告)日:2019-10-01
申请号:US15382439
申请日:2016-12-16
Applicant: Xilinx, Inc.
Inventor: Chaithanya Dudha , Zhao Ma , Krishna Garlapati , Ashish Sirasao
IPC: G06F17/50
Abstract: Methods and apparatus relating generally to synthesis are described. In such a method, a directed graph for a circuit design is generated. A cascaded chain is identified in the directed graph with a timing violation. A pipeline register stage of the cascaded chain is moved (or added) to remove the timing violation. The circuit design is transformed to provide a netlist including the pipeline register stage.
-
公开(公告)号:US10331836B1
公开(公告)日:2019-06-25
申请号:US15730431
申请日:2017-10-11
Applicant: Xilinx, Inc.
Inventor: Anup Hosangadi , Sumanta Datta , Aman Gayasen , Ashish Sirasao
IPC: G06F17/50 , H03K19/173
Abstract: Implementing a circuit design can include determining a chain of a plurality of loop elements of a circuit design, wherein each loop element includes a bit select node configured to perform a bit assignment operation and a corresponding address calculation node, wherein the address calculation nodes use a common variable to calculate a starting bit location provided to the corresponding bit select node. In response to the determining, the chain is replicated resulting in one chain for each value of the common variable and transforming each chain into a plurality of wires. A multiplexer is inserted into the circuit design. The plurality of wires for each chain is coupled to inputs of the multiplexer and the common variable is provided to the multiplexer as a select signal.
-
公开(公告)号:US20190114533A1
公开(公告)日:2019-04-18
申请号:US15785679
申请日:2017-10-17
Applicant: Xilinx, Inc.
Inventor: Aaron Ng , Jindrich Zejda , Elliott Delaye , Xiao Teng , Sonal Santan , Soren T. Soe , Ashish Sirasao , Ehsan Ghasemi , Sean Settle
Abstract: Embodiments herein describe techniques for interfacing a neural network application with a neural network accelerator using a library. The neural network application may execute on a host computing system while the neural network accelerator executes on a massively parallel hardware system, e.g., a FPGA. The library operates a pipeline for submitting the tasks received from the neural network application to the neural network accelerator. In one embodiment, the pipeline includes a pre-processing stage, an FPGA execution stage, and a post-processing stage which each correspond to different threads. When receiving a task from the neural network application, the library generates a packet that includes the information required for the different stages in the pipeline to perform the tasks. Because the stages correspond to different threads, the library can process multiple packets in parallel which can increase the utilization of the neural network accelerator on the hardware system.
-
公开(公告)号:US09875330B2
公开(公告)日:2018-01-23
申请号:US14960176
申请日:2015-12-04
Applicant: Xilinx, Inc.
Inventor: Ilya K. Ganusov , Henri Fraisse , Ashish Sirasao , Alireza S. Kaviani
IPC: G06F17/50
CPC classification number: G06F17/5072 , G06F17/5045 , G06F17/505 , G06F17/5054
Abstract: Disclosed approaches for processing a circuit design include identifying duplicate instances of a module in a representation of the circuit design. A processor circuit performs folding operations for at least one pair of the duplicate instances of the module. One instance of the duplicates is removed from the circuit design, and a multiplexer is inserted. The multiplexer receives and selects one of the input signals to the duplicate instances and provides the selected input signal to the remaining instance. For each flip-flop in the remaining instance, a pipelined flip-flop is inserted. Connections to a first clock signal in the remaining instance are replaced with connections to a second clock signal having twice the frequency of the first clock signal. An alignment circuit is inserted to receive the output signal from the first instance and provide concurrent first and second output signals.
-
公开(公告)号:US11568218B2
公开(公告)日:2023-01-31
申请号:US15786288
申请日:2017-10-17
Applicant: Xilinx, Inc.
Inventor: Aaron Ng , Jindrich Zejda , Elliott Delaye , Xiao Teng , Ashish Sirasao
Abstract: A disclosed neural network processing system includes a host computer system, a RAMs coupled to the host computer system, and neural network accelerators coupled to the RAMs, respectively. The host computer system is configured with software that when executed causes the host computer system to write input data and work requests to the RAMS. Each work request specifies a subset of neural network operations to perform and memory locations in a RAM of the input data and parameters. A graph of dependencies among neural network operations is built and additional dependencies added. The operations are partitioned into coarse grain tasks and fine grain subtasks for optimal scheduling for parallel execution. The subtasks are scheduled to accelerator kernels of matching capabilities. Each neural network accelerator is configured to read a work request from the respective RAM and perform the subset of neural network operations on the input data using the parameters.
-
公开(公告)号:US11386644B2
公开(公告)日:2022-07-12
申请号:US15786267
申请日:2017-10-17
Applicant: Xilinx, Inc.
Inventor: Elliott Delaye , Ashish Sirasao , Aaron Ng , Yongjun Wu , Jindrich Zejda
Abstract: An example preprocessor circuit includes: a first buffer configured to store rows of image data and output a row thereof; a second buffer, coupled to the first buffer, including storage locations to store respective image samples of the row output by the first buffer; shift registers; an interconnect network including connections, each connection coupling a respective one of the shift registers to more than one of the storage locations, one or more of the storage locations being coupled to more than one of the connections; and a control circuit configured to load the shift registers with the image samples based on the connections and shift the shift registers to output streams of image samples.
-
公开(公告)号:US11204747B1
公开(公告)日:2021-12-21
申请号:US15786395
申请日:2017-10-17
Applicant: Xilinx, Inc.
Inventor: Jindrich Zejda , Elliott Delaye , Yongjun Wu , Aaron Ng , Ashish Sirasao , Khang K. Dao , Christopher J. Case
Abstract: Embodiments herein describe techniques for interfacing a neural network application with a neural network accelerator that operate on two heterogeneous computing systems. For example, the neural network application may execute on a central processing unit (CPU) in a computing system while the neural network accelerator executes on a FPGA. As a result, when moving a software-hardware boundary between the two heterogeneous systems, changes may be made to both the neural network application (using software code) and to the accelerator (using RTL). The embodiments herein describe a software defined approach where shared interface code is used to express both sides of the interface between the two heterogeneous systems in a single abstraction (e.g., a software class).
-
公开(公告)号:US11106968B1
公开(公告)日:2021-08-31
申请号:US15989075
申请日:2018-05-24
Applicant: Xilinx, Inc.
Inventor: Ehsan Ghasemi , Elliott Delaye , Ashish Sirasao
IPC: G06N3/04
Abstract: A circuit arrangement includes a buffer, a height traversal circuit configured to generate a sequence of IFM height values in response to first control signals, a width traversal circuit configured to generate a sequence of IFM width values in response to second control signals, a control circuit, and an address generation circuit. The control circuit is configured to input an OFM height, an OFM width, a kernel height, and a kernel width; generate the first control signals at times based on the OFM height and the kernel height; and generate the second control signals at times based on the OFM width and the kernel width. The address generation circuit is configured to generate a sequence of addresses based on the sequences of IFM height values and IFM width values, provide the sequence of addresses to the buffer, and enable reading from the buffer.
-
公开(公告)号:US10726175B1
公开(公告)日:2020-07-28
申请号:US16291952
申请日:2019-03-04
Applicant: Xilinx, Inc.
Inventor: Chaithanya Dudha , Satyaprakash Pareek , Bing Tian , Ashish Sirasao
IPC: G06F30/30 , H01L27/02 , G06F30/327 , G06F30/398 , G06F30/00
Abstract: A memory optimization method includes identifying, within a circuit design, a memory having an arithmetic operator at an output side and/or an input side of the memory. The memory may include a read-only memory (ROM). In some examples, an input of the arithmetic operator includes a constant value. In some embodiments, the memory optimization method further includes absorbing a function of the arithmetic operator into the memory. By way of example, the absorbing the function includes modifying contents of the memory based on the function of the arithmetic operator to provide an updated memory and removing the arithmetic operator from the circuit design.
-
公开(公告)号:US10572409B1
公开(公告)日:2020-02-25
申请号:US15976722
申请日:2018-05-10
Applicant: Xilinx, Inc.
Inventor: Jindrich Zejda , Ling Liu , Yifei Zhou , Ashish Sirasao
Abstract: A memory arrangement can store a matrix of matrix data elements specified as index-value pairs that indicate row and column indices and associated values. First split-and-merge circuitry is coupled between the memory arrangement and a first set of FIFO buffers for reading the matrix data elements from the memory arrangement and putting the matrix data elements in the first set of FIFO buffers based on column indices. A pairing circuit is configured to read vector data elements, pair the vector data elements with the matrix data elements, and put the paired matrix and vector data elements in a second set of FIFO buffers based on column indices. Second split-and-merge circuitry is configured to read paired matrix and vector data elements from the second set of FIFO buffers and put the paired matrix and vector data elements in a third set of FIFO buffers based on row indices.
-
-
-
-
-
-
-
-
-