-
11.
公开(公告)号:US10678509B1
公开(公告)日:2020-06-09
申请号:US16106743
申请日:2018-08-21
Applicant: Xilinx, Inc.
Inventor: Sean Settle , Elliott Delaye , Aaron Ng , Ehsan Ghasemi , Ashish Sirasao , Xiao Teng , Jindrich Zejda
Abstract: An example multiply accumulate (MACC) circuit includes a multiply-accumulator having an accumulator output register, a scaler, coupled to the multiply accumulator, and a control circuit coupled to the multiply-accumulator and the scaler. The control circuit is configured to provide control data to the scaler, the control data indicative of: a most-significant bit (MSB) to least significant bit (LSB) range for selecting bit indices from the accumulator output register for implementing a first right shift; a multiplier; and a second right shift.
-
12.
公开(公告)号:US10460416B1
公开(公告)日:2019-10-29
申请号:US15786244
申请日:2017-10-17
Applicant: Xilinx, Inc.
Inventor: Ashish Sirasao , Elliott Delaye , Aaron Ng , Ehsan Ghasemi
IPC: G06T1/20 , G06T1/60 , H04N21/2381 , G06F3/03 , H03K19/177 , H04N5/30 , G06F12/00
Abstract: An example preprocessor circuit for formatting image data into a plurality of streams of image samples includes: a plurality of memory banks configured to store the image data; multiplexer circuitry coupled to the memory banks; a first plurality of registers coupled to the multiplexer circuitry; a second plurality of registers coupled to the first plurality of registers, outputs of the second plurality of registers configured to provide the plurality of streams of image samples; and control circuitry configured to generate addresses for the plurality of memory banks, control the multiplexer circuitry to select among outputs of the plurality of memory banks, control the first plurality of registers to store outputs of the second plurality of multiplexers, and control the second plurality of registers to store outputs of the first plurality of registers.
-
公开(公告)号:US10354733B1
公开(公告)日:2019-07-16
申请号:US15786321
申请日:2017-10-17
Applicant: Xilinx, Inc.
Inventor: Jindrich Zejda , Elliott Delaye , Ashish Sirasao , Yongjun Wu , Aaron Ng
Abstract: Methods and apparatus are described for partitioning and reordering block-based matrix multiplications for high-speed data streaming in general matrix multiplication (GEMM), which may be implemented by a programmable integrated circuit (IC). By preloading and hierarchically caching the blocks, examples of the present disclosure reduce the double data rate (DDR) memory intake bandwidth for software-defined GEMM accelerators.
-
公开(公告)号:US09836568B1
公开(公告)日:2017-12-05
申请号:US15069524
申请日:2016-03-14
Applicant: Xilinx, Inc.
Inventor: Ilya K. Ganusov , Aaron Ng , Ronald E. Plyler , Sabyasachi Das , Frederic Revenu
CPC classification number: G06F17/5072 , G06F17/5081
Abstract: Improving timing of a circuit design may include determining, using a processor, critical feed-forward paths of the circuit design, determining, using the processor, a sequential loop having a largest loop delay within the circuit design, and iteratively cutting, using the processor, the critical feed-forward paths and feed-forward paths parallel to the cut critical feed-forward paths until a stopping condition is met. The stopping condition may be determined according to the largest loop delay. The circuit design may be modified by inserting a register at each cut feed-forward path.
-
公开(公告)号:US09646126B1
公开(公告)日:2017-05-09
申请号:US14671920
申请日:2015-03-27
Applicant: Xilinx, Inc.
Inventor: Ruibing Lu , Zhiyong Wang , Aaron Ng , Sabyasachi Das
CPC classification number: G06F17/5068 , G06F17/5031 , G06F17/5077 , G06F2217/84
Abstract: Post-routing processing of a circuit design may include determining, using a processor, a baseline delay for a path of a routed circuit design, comparing, using the processor, the baseline delay of the path with a timing constraint of the path, and selectively applying, according to the comparing, a structural netlist optimization to the path resulting in an optimized path using a processor.
-
公开(公告)号:US12079158B2
公开(公告)日:2024-09-03
申请号:US17814817
申请日:2022-07-25
Applicant: Xilinx, Inc.
Inventor: Sanket Pandit , Jorn Tuyls , Xiao Teng , Rajeev Patwari , Ehsan Ghasemi , Elliott Delaye , Aaron Ng
CPC classification number: G06F15/8053 , G06F9/45533
Abstract: An integrated circuit includes a plurality of kernels and a virtual machine coupled to the plurality of kernels. The virtual machine is configured to interpret instructions directed to different ones of the plurality of kernels. The virtual machine is configured to control operation of the different ones of the plurality of kernels responsive to the instructions.
-
公开(公告)号:US20240045692A1
公开(公告)日:2024-02-08
申请号:US17818309
申请日:2022-08-08
Applicant: Xilinx, Inc.
Inventor: Xiao Teng , Tejus Siddagangaiah , Bryan Lozano , Ehsan Ghasemi , Rajeev Patwari , Elliott Delaye , Jorn Tuyls , Aaron Ng , Sanket Pandit , Pramod Peethambaran , Satyaprakash Pareek
CPC classification number: G06F9/3814 , G06F9/467 , G06F9/3004
Abstract: Controlling a data processing (DP) array includes creating a replica of a register address space of the DP array based on the design and the DP array. A sequence of instructions, including write instructions and read instructions, is received. The write instructions correspond to buffer descriptors specifying runtime data movements for a design for a DP array. The write instructions are converted into transaction instructions and the read instructions are converted into wait instructions based on the replica of the register address space. The transaction instructions and the wait instructions are included in an instruction buffer. The instruction buffer is provided to a microcontroller configured to execute the transaction instructions and the wait instructions to implement the runtime data movements for the design as implemented in the DP array. In another aspect, the instruction buffer is stored in a file for subsequent execution by the microcontroller.
-
公开(公告)号:US20230244966A1
公开(公告)日:2023-08-03
申请号:US17649912
申请日:2022-02-03
Applicant: Xilinx, Inc.
Inventor: Varun Sharma , Aaron Ng
Abstract: An inference server is capable of receiving a plurality of inference requests from one or more client systems. Each inference request specifies one of a plurality of different endpoints. The inference server can generate a plurality of batches each including one or more of the plurality of inference requests directed to a same endpoint. The inference server also can process the plurality of batches using a plurality of workers executing in an execution layer therein. Each batch is processed by a worker of the plurality of workers indicated by the endpoint of the batch.
-
公开(公告)号:US11568218B2
公开(公告)日:2023-01-31
申请号:US15786288
申请日:2017-10-17
Applicant: Xilinx, Inc.
Inventor: Aaron Ng , Jindrich Zejda , Elliott Delaye , Xiao Teng , Ashish Sirasao
Abstract: A disclosed neural network processing system includes a host computer system, a RAMs coupled to the host computer system, and neural network accelerators coupled to the RAMs, respectively. The host computer system is configured with software that when executed causes the host computer system to write input data and work requests to the RAMS. Each work request specifies a subset of neural network operations to perform and memory locations in a RAM of the input data and parameters. A graph of dependencies among neural network operations is built and additional dependencies added. The operations are partitioned into coarse grain tasks and fine grain subtasks for optimal scheduling for parallel execution. The subtasks are scheduled to accelerator kernels of matching capabilities. Each neural network accelerator is configured to read a work request from the respective RAM and perform the subset of neural network operations on the input data using the parameters.
-
公开(公告)号:US11386644B2
公开(公告)日:2022-07-12
申请号:US15786267
申请日:2017-10-17
Applicant: Xilinx, Inc.
Inventor: Elliott Delaye , Ashish Sirasao , Aaron Ng , Yongjun Wu , Jindrich Zejda
Abstract: An example preprocessor circuit includes: a first buffer configured to store rows of image data and output a row thereof; a second buffer, coupled to the first buffer, including storage locations to store respective image samples of the row output by the first buffer; shift registers; an interconnect network including connections, each connection coupling a respective one of the shift registers to more than one of the storage locations, one or more of the storage locations being coupled to more than one of the connections; and a control circuit configured to load the shift registers with the image samples based on the connections and shift the shift registers to output streams of image samples.
-
-
-
-
-
-
-
-
-