Patent search ap:("Xilinx Page Inc.") AND inv:"Xiao Teng"

11.

发明公开
HARDWARE ACCELERATION OF MACHINE LEARNING DESIGNS 审中-公开

公开(公告)号：US20230401480A1

公开(公告)日：2023-12-14

申请号：US17806906

申请日：2022-06-14

Applicant: Xilinx, Inc.

Inventor： Ehsan Ghasemi , Rajeev Patwari , Elliott Delaye , Jorn Tuyls , Ephrem C. Wu , Xiao Teng , Sanket Pandit

IPC: G06N20/00

CPC classification number: G06N20/00

Abstract: Hardware acceleration of machine learning (ML) designs includes translating an ML primitive into an intermediate representation. The intermediate representation is subdivided to specify a functional compute block. The functional compute block is sized according to a compute node primitive adapted for implementing the ML primitive on target hardware. An overlay is generated for the ML primitive, at least in part, by mapping the functional compute block to the compute node primitive. The overlay is synthesizable to implement the ML primitive on the target hardware. The overlay can be scheduled for operation within the target hardware as part of an ML design including the ML primitive.

12.

发明授权
Neural network processing system having host controlled kernel acclerators 有权

公开(公告)号：US11568218B2

公开(公告)日：2023-01-31

申请号：US15786288

申请日：2017-10-17

Applicant: Xilinx, Inc.

Inventor： Aaron Ng , Jindrich Zejda , Elliott Delaye , Xiao Teng , Ashish Sirasao

IPC: G06N3/063 , G06N3/04

Abstract: A disclosed neural network processing system includes a host computer system, a RAMs coupled to the host computer system, and neural network accelerators coupled to the RAMs, respectively. The host computer system is configured with software that when executed causes the host computer system to write input data and work requests to the RAMS. Each work request specifies a subset of neural network operations to perform and memory locations in a RAM of the input data and parameters. A graph of dependencies among neural network operations is built and additional dependencies added. The operations are partitioned into coarse grain tasks and fine grain subtasks for optimal scheduling for parallel execution. The subtasks are scheduled to accelerator kernels of matching capabilities. Each neural network accelerator is configured to read a work request from the respective RAM and perform the subset of neural network operations on the input data using the parameters.

13.

发明授权
Instruction set architecture for data processing array control 有权

公开(公告)号：US12248786B2

公开(公告)日：2025-03-11

申请号：US17818309

申请日：2022-08-08

Applicant: Xilinx, Inc.

Inventor： Xiao Teng , Tejus Siddagangaiah , Bryan Lozano , Ehsan Ghasemi , Rajeev Patwari , Elliott Delaye , Jorn Tuyls , Aaron Ng , Sanket Pandit , Pramod Peethambaran , Satyaprakash Pareek

IPC: G06F9/30 , G06F9/38 , G06F9/46

Abstract: Controlling a data processing (DP) array includes creating a replica of a register address space of the DP array based on the design and the DP array. A sequence of instructions, including write instructions and read instructions, is received. The write instructions correspond to buffer descriptors specifying runtime data movements for a design for a DP array. The write instructions are converted into transaction instructions and the read instructions are converted into wait instructions based on the replica of the register address space. The transaction instructions and the wait instructions are included in an instruction buffer. The instruction buffer is provided to a microcontroller configured to execute the transaction instructions and the wait instructions to implement the runtime data movements for the design as implemented in the DP array. In another aspect, the instruction buffer is stored in a file for subsequent execution by the microcontroller.

14.

发明公开
INSTRUCTION GENERATION AND PROGRAMMING MODEL FOR A DATA PROCESSING ARRAY AND MICROCONTROLLER 审中-公开

公开(公告)号：US20240069511A1

公开(公告)日：2024-02-29

申请号：US17823902

申请日：2022-08-31

Applicant: Xilinx, Inc.

Inventor： Jorn Tuyls , Xiao Teng , Sanket Pandit , Rajeev Patwari , Qian Zhou , Ehsan Ghasemi , Ephrem C. Wu , Elliott Delaye , Aaron Ng

IPC: G05B19/042

CPC classification number: G05B19/042 , G05B2219/25255 , G05B2219/25257

Abstract: Instruction generation for a data processing array and microcontroller includes generating a tensor-level intermediate representation from a machine learning model using kernel expressions. Statements of the tensor-level intermediate representation are partitioned into a first set of statements and a second set of statements. From the first set of statements, kernel instructions are generated based on a reconfigurable neural engine model. The kernel instructions are executable by a compute tile of a data processing array to implement compute functions of the machine learning model. From the set of second statements, microcontroller instructions are generated based on a super-graph model. The microcontroller instructions are executable by a microcontroller of the data processing array to move data into and out from the data processing array.

15.

发明公开
RECONFIGURABLE NEURAL ENGINE WITH EXTENSIBLE INSTRUCTION SET ARCHITECTURE 审中-公开

公开(公告)号：US20240028556A1

公开(公告)日：2024-01-25

申请号：US17814817

申请日：2022-07-25

Applicant: Xilinx, Inc.

Inventor： Sanket Pandit , Jorn Tuyls , Xiao Teng , Rajeev Patwari , Ehsan Ghasemi , Elliott Delaye , Aaron Ng

IPC: G06F15/80 , G06F9/455

CPC classification number: G06F15/8053 , G06F9/45533

Abstract: An integrated circuit includes a plurality of kernels and a virtual machine coupled to the plurality of kernels. The virtual machine is configured to interpret instructions directed to different ones of the plurality of kernels. The virtual machine is configured to control operation of the different ones of the plurality of kernels responsive to the instructions.

16.

发明授权
Multi-layer neural network processing by a neural network accelerator using host communicated merged weights and a package of per-layer instructions 有权

公开(公告)号：US11620490B2

公开(公告)日：2023-04-04

申请号：US15785800

申请日：2017-10-17

Applicant: Xilinx, Inc.

Inventor： Aaron Ng , Elliott Delaye , Ehsan Ghasemi , Xiao Teng , Jindrich Zejda , Yongjun Wu , Sean Settle , Ashish Sirasao

IPC: G06N3/04 , G06N3/08 , G06N3/063

Abstract: In the disclosed methods and systems for processing in a neural network system, a host computer system writes a plurality of weight matrices associated with a plurality of layers of a neural network to a memory shared with a neural network accelerator. The host computer system further assembles a plurality of per-layer instructions into an instruction package. Each per-layer instruction specifies processing of a respective layer of the plurality of layers of the neural network, and respective offsets of weight matrices in a shared memory. The host computer system writes input data and the instruction package to the shared memory. The neural network accelerator reads the instruction package from the shared memory and processes the plurality of per-layer instructions of the instruction package.

17.

发明授权
Sparse matrix processing circuitry 有权

公开(公告)号：US10936311B1

公开(公告)日：2021-03-02

申请号：US16505987

申请日：2019-07-09

Applicant: Xilinx, Inc.

Inventor： Ling Liu , Yifei Zhou , Xiao Teng , Ashish Sirasao , Chuanhua Song , Aaron Ng

IPC: G06F9/30 , G06F17/16

Abstract: Disclosed approaches for multiplying a sparse matrix by dense a vector or matrix include first memory banks for storage of column indices, second memory banks for storage of row indices, and third memory banks for storage of non-zero values of a sparse matrix. A pairing circuit distributes an input stream of vector elements across first first-in-first-out (FIFO) buffers according to the buffered column indices. Multiplication circuitry multiplies vector elements output from the first FIFO buffers by corresponding ones of the non-zero values from the third memory banks, and stores products in second FIFO buffers. Row-aligner circuitry organize the products output from the second FIFO buffers into third FIFO buffers according to row indices in the second memory banks. Accumulation circuitry accumulates respective totals from products output from the third FIFO buffers.

18.

发明申请
NEURAL NETWORK PROCESSING SYSTEM HAVING MULTIPLE PROCESSORS AND A NEURAL NETWORK ACCELERATOR 审中-公开

公开(公告)号：US20190114534A1

公开(公告)日：2019-04-18

申请号：US15785685

申请日：2017-10-17

Applicant: Xilinx, Inc.

Inventor： Xiao Teng , Aaron Ng , Ashish Sirasao , Elliott Delaye

IPC: G06N3/063 , G06N3/08 , G06N3/04

Abstract: At least one neural network accelerator performs operations of a first subset of layers of a neural network on an input data set, generates an intermediate data set, and stores the intermediate data set in a shared memory queue in a shared memory. A first processor element of a host computer system provides input data to the neural network accelerator and signals the neural network accelerator to perform the operations of the first subset of layers of the neural network on the input data set. A second processor element of the host computer system reads the intermediate data set from the shared memory queue, performs operations of a second subset of layers of the neural network on the intermediate data set, and generates an output data set while the neural network accelerator is performing the operations of the first subset of layers of the neural network on another input data set.

Search Results

Country/Region

Patent validity

Application date

Publication (announcement) day

applicant

The country/region where the applicant is located

Inventor

IPC

IPC Department

IPC class

IPC subclass

IPC group

IPC team

Appearance classification