Patent search ap:("Xilinx Page Inc.") AND inv:"Aaron Ng"

21.

发明授权
Re-targetable interface for data exchange between heterogeneous systems and accelerator abstraction into software instructions 有权

公开(公告)号：US11204747B1

公开(公告)日：2021-12-21

申请号：US15786395

申请日：2017-10-17

Applicant: Xilinx, Inc.

Inventor： Jindrich Zejda , Elliott Delaye , Yongjun Wu , Aaron Ng , Ashish Sirasao , Khang K. Dao , Christopher J. Case

IPC: G06F9/45 , G06F8/41 , G06N3/02 , G06F13/28 , G06F8/30 , G06F9/451 , G06F9/50 , G06F13/362

Abstract: Embodiments herein describe techniques for interfacing a neural network application with a neural network accelerator that operate on two heterogeneous computing systems. For example, the neural network application may execute on a central processing unit (CPU) in a computing system while the neural network accelerator executes on a FPGA. As a result, when moving a software-hardware boundary between the two heterogeneous systems, changes may be made to both the neural network application (using software code) and to the accelerator (using RTL). The embodiments herein describe a software defined approach where shared interface code is used to express both sides of the interface between the two heterogeneous systems in a single abstraction (e.g., a software class).

22.

发明授权
Timing closure of circuit designs for integrated circuits 有权

公开(公告)号：US10366201B1

公开(公告)日：2019-07-30

申请号：US15495760

申请日：2017-04-24

Applicant: Xilinx, Inc.

Inventor： Aaron Ng , Sridhar Krishnamurthy , Grigor S. Gasparyan

IPC: G06F9/455 , G06F17/50

Abstract: Closing timing for a circuit design can include displaying, using a display device, a first region having a plurality of controls corresponding to a plurality of data sets generated at different times during a phase of a design flow for a circuit design, wherein each control selects a data set associated with the control, and displaying, using the display device, a second region configured to display a list of critical paths for data sets selected from the first region using one of the plurality of controls. Closing timing further can include displaying, using the display device, a third region configured to display a representation of a target integrated circuit including layouts for the critical paths of the list for implementations of the circuit design specified by the selected data sets.

23.

发明申请
IMAGE PREPROCESSING FOR GENERALIZED IMAGE PROCESSING 审中-公开

公开(公告)号：US20190114499A1

公开(公告)日：2019-04-18

申请号：US15786267

申请日：2017-10-17

Applicant: Xilinx, Inc.

Inventor： Elliott Delaye , Ashish Sirasao , Aaron Ng , Yongjun Wu , Jindrich Zejda

IPC: G06K9/00 , G06F3/06 , G06K9/46 , G06T1/60 , G06T1/20

Abstract: An example preprocessor circuit for formatting image data into a plurality of streams of image samples includes: a first buffer configured to store a plurality of rows of the image data and output a row of the plurality of rows; a second buffer, coupled to the first buffer, including a plurality of storage locations to store a respective plurality of image samples of the row output by the first buffer; a plurality of shift registers; an interconnect network including a plurality of connections, each connection coupling a respective one of the plurality of shift registers to more than one of the plurality of storage locations, one or more of the plurality of storage locations being coupled to more than one of the plurality of connections; and a control circuit configured to load the plurality of shift registers with the plurality of image samples based on the plurality of connections and shift the plurality of shift registers to output the plurality of streams of image samples.

24.

发明授权
Instruction set architecture for data processing array control 有权

公开(公告)号：US12248786B2

公开(公告)日：2025-03-11

申请号：US17818309

申请日：2022-08-08

Applicant: Xilinx, Inc.

Inventor： Xiao Teng , Tejus Siddagangaiah , Bryan Lozano , Ehsan Ghasemi , Rajeev Patwari , Elliott Delaye , Jorn Tuyls , Aaron Ng , Sanket Pandit , Pramod Peethambaran , Satyaprakash Pareek

IPC: G06F9/30 , G06F9/38 , G06F9/46

Abstract: Controlling a data processing (DP) array includes creating a replica of a register address space of the DP array based on the design and the DP array. A sequence of instructions, including write instructions and read instructions, is received. The write instructions correspond to buffer descriptors specifying runtime data movements for a design for a DP array. The write instructions are converted into transaction instructions and the read instructions are converted into wait instructions based on the replica of the register address space. The transaction instructions and the wait instructions are included in an instruction buffer. The instruction buffer is provided to a microcontroller configured to execute the transaction instructions and the wait instructions to implement the runtime data movements for the design as implemented in the DP array. In another aspect, the instruction buffer is stored in a file for subsequent execution by the microcontroller.

25.

发明公开
INSTRUCTION GENERATION AND PROGRAMMING MODEL FOR A DATA PROCESSING ARRAY AND MICROCONTROLLER 审中-公开

公开(公告)号：US20240069511A1

公开(公告)日：2024-02-29

申请号：US17823902

申请日：2022-08-31

Applicant: Xilinx, Inc.

Inventor： Jorn Tuyls , Xiao Teng , Sanket Pandit , Rajeev Patwari , Qian Zhou , Ehsan Ghasemi , Ephrem C. Wu , Elliott Delaye , Aaron Ng

IPC: G05B19/042

CPC classification number: G05B19/042 , G05B2219/25255 , G05B2219/25257

Abstract: Instruction generation for a data processing array and microcontroller includes generating a tensor-level intermediate representation from a machine learning model using kernel expressions. Statements of the tensor-level intermediate representation are partitioned into a first set of statements and a second set of statements. From the first set of statements, kernel instructions are generated based on a reconfigurable neural engine model. The kernel instructions are executable by a compute tile of a data processing array to implement compute functions of the machine learning model. From the set of second statements, microcontroller instructions are generated based on a super-graph model. The microcontroller instructions are executable by a microcontroller of the data processing array to move data into and out from the data processing array.

26.

发明公开
RECONFIGURABLE NEURAL ENGINE WITH EXTENSIBLE INSTRUCTION SET ARCHITECTURE 审中-公开

公开(公告)号：US20240028556A1

公开(公告)日：2024-01-25

申请号：US17814817

申请日：2022-07-25

Applicant: Xilinx, Inc.

Inventor： Sanket Pandit , Jorn Tuyls , Xiao Teng , Rajeev Patwari , Ehsan Ghasemi , Elliott Delaye , Aaron Ng

IPC: G06F15/80 , G06F9/455

CPC classification number: G06F15/8053 , G06F9/45533

Abstract: An integrated circuit includes a plurality of kernels and a virtual machine coupled to the plurality of kernels. The virtual machine is configured to interpret instructions directed to different ones of the plurality of kernels. The virtual machine is configured to control operation of the different ones of the plurality of kernels responsive to the instructions.

27.

发明授权
Multi-layer neural network processing by a neural network accelerator using host communicated merged weights and a package of per-layer instructions 有权

公开(公告)号：US11620490B2

公开(公告)日：2023-04-04

申请号：US15785800

申请日：2017-10-17

Applicant: Xilinx, Inc.

Inventor： Aaron Ng , Elliott Delaye , Ehsan Ghasemi , Xiao Teng , Jindrich Zejda , Yongjun Wu , Sean Settle , Ashish Sirasao

IPC: G06N3/04 , G06N3/08 , G06N3/063

Abstract: In the disclosed methods and systems for processing in a neural network system, a host computer system writes a plurality of weight matrices associated with a plurality of layers of a neural network to a memory shared with a neural network accelerator. The host computer system further assembles a plurality of per-layer instructions into an instruction package. Each per-layer instruction specifies processing of a respective layer of the plurality of layers of the neural network, and respective offsets of weight matrices in a shared memory. The host computer system writes input data and the instruction package to the shared memory. The neural network accelerator reads the instruction package from the shared memory and processes the plurality of per-layer instructions of the instruction package.

28.

发明授权
Host-directed multi-layer neural network processing via per-layer work requests 有权

公开(公告)号：US11429848B2

公开(公告)日：2022-08-30

申请号：US15786102

申请日：2017-10-17

Applicant: Xilinx, Inc.

Inventor： Aaron Ng , Elliott Delaye , Jindrich Zejda , Ashish Sirasao

IPC: G06N3/04 , G06N3/063

Abstract: In disclosed approaches of neural network processing, a host computer system copies an input data matrix from host memory to a shared memory for performing neural network operations of a first layer of a neural network by a neural network accelerator. The host instructs the neural network accelerator to perform neural network operations of each layer of the neural network beginning with the input data matrix. The neural network accelerator performs neural network operations of each layer in response to the instruction from the host. The host waits until the neural network accelerator signals completion of performing neural network operations of layer i before instructing the neural network accelerator to commence performing neural network operations of layer i+1, for i≥1. The host instructs the neural network accelerator to use a results data matrix in the shared memory from layer i as an input data matrix for layer i+1 for i≥1.

29.

发明授权
Sparse matrix processing circuitry 有权

公开(公告)号：US10936311B1

公开(公告)日：2021-03-02

申请号：US16505987

申请日：2019-07-09

Applicant: Xilinx, Inc.

Inventor： Ling Liu , Yifei Zhou , Xiao Teng , Ashish Sirasao , Chuanhua Song , Aaron Ng

IPC: G06F9/30 , G06F17/16

Abstract: Disclosed approaches for multiplying a sparse matrix by dense a vector or matrix include first memory banks for storage of column indices, second memory banks for storage of row indices, and third memory banks for storage of non-zero values of a sparse matrix. A pairing circuit distributes an input stream of vector elements across first first-in-first-out (FIFO) buffers according to the buffered column indices. Multiplication circuitry multiplies vector elements output from the first FIFO buffers by corresponding ones of the non-zero values from the third memory banks, and stores products in second FIFO buffers. Row-aligner circuitry organize the products output from the second FIFO buffers into third FIFO buffers according to row indices in the second memory banks. Accumulation circuitry accumulates respective totals from products output from the third FIFO buffers.

30.

发明授权
Data format suitable for fast massively parallel general matrix multiplication in a programmable IC 有权

公开(公告)号：US10515135B1

公开(公告)日：2019-12-24

申请号：US15785688

申请日：2017-10-17

Applicant: Xilinx, Inc.

Inventor： Jindrich Zejda , Elliott Delaye , Aaron Ng , Ashish Sirasao , Yongjun Wu

IPC: G06F17/16 , G06F7/53 , G06F17/12

Abstract: Methods and apparatus are described for performing data-intensive compute algorithms, such as fast massively parallel general matrix multiplication (GEMM), using a particular data format for both storing data to and reading data from memory. This data format may be utilized for arbitrarily-sized input matrices for GEMM implemented on a finite-size GEMM accelerator in the form of a rectangular compute array of digital signal processing (DSP) elements or similar compute cores. This data format solves the issue of double data rate (DDR) dynamic random access memory (DRAM) bandwidth by allowing both linear DDR addressing and single cycle loading of data into the compute array, avoiding input/output (I/O) and/or DDR bottlenecks.

Search Results

Country/Region

Patent validity

Application date

Publication (announcement) day

applicant

The country/region where the applicant is located

Inventor

IPC

IPC Department

IPC class

IPC subclass

IPC group

IPC team

Appearance classification