Patent search ap:("Xilinx Page Inc.") AND inv:"Aaron Ng"

1.

发明申请
MACHINE LEARNING RUNTIME LIBRARY FOR NEURAL NETWORK ACCELERATION 审中-公开

公开(公告)号：US20190114533A1

公开(公告)日：2019-04-18

申请号：US15785679

申请日：2017-10-17

Applicant: Xilinx, Inc.

Inventor： Aaron Ng , Jindrich Zejda , Elliott Delaye , Xiao Teng , Sonal Santan , Soren T. Soe , Ashish Sirasao , Ehsan Ghasemi , Sean Settle

IPC: G06N3/063 , G06N3/10 , G06N3/04 , G06N3/08

Abstract: Embodiments herein describe techniques for interfacing a neural network application with a neural network accelerator using a library. The neural network application may execute on a host computing system while the neural network accelerator executes on a massively parallel hardware system, e.g., a FPGA. The library operates a pipeline for submitting the tasks received from the neural network application to the neural network accelerator. In one embodiment, the pipeline includes a pre-processing stage, an FPGA execution stage, and a post-processing stage which each correspond to different threads. When receiving a task from the neural network application, the library generates a packet that includes the information required for the different stages in the pipeline to perform the tasks. Because the stages correspond to different threads, the library can process multiple packets in parallel which can increase the utilization of the neural network accelerator on the hardware system.

2.

发明申请
NEURAL NETWORK BASED PHYSICAL SYNTHESIS FOR CIRCUIT DESIGNS 审中-公开

公开(公告)号：US20180203956A1

公开(公告)日：2018-07-19

申请号：US15407875

申请日：2017-01-17

Applicant: Xilinx, Inc.

Inventor： Aaron Ng , Sabyasachi Das , Prabal Basu

IPC: G06F17/50

CPC classification number: G06F17/505 , G06F17/5054 , G06F17/5068 , G06F17/5081 , G06F2217/78 , G06F2217/84

Abstract: Physical synthesis for a circuit design can include determining, using a processor, features relating to a signal path of the circuit design not meeting a timing requirement, processing the features through a first neural network model using the processor, wherein the first neural network model is trained to indicate effectiveness of a first physical synthesis optimization, and selectively performing, using the processor, the first physical synthesis optimization for the signal path based upon a result from the first neural network model.

3.

发明授权
Machine learning runtime library for neural network acceleration 有权

公开(公告)号：US11694066B2

公开(公告)日：2023-07-04

申请号：US15785679

申请日：2017-10-17

Applicant: Xilinx, Inc.

Inventor： Aaron Ng , Jindrich Zejda , Elliott Delaye , Xiao Teng , Sonal Santan , Soren T. Soe , Ashish Sirasao , Ehsan Ghasemi , Sean Settle

IPC: G06N3/063 , G06N3/10 , G06N3/08 , G06N3/04 , G06V10/94 , G06N3/045

CPC classification number: G06N3/063 , G06N3/04 , G06N3/08 , G06N3/10 , G06N3/045 , G06V10/955

Abstract: Embodiments herein describe techniques for interfacing a neural network application with a neural network accelerator using a library. The neural network application may execute on a host computing system while the neural network accelerator executes on a massively parallel hardware system, e.g., a FPGA. The library operates a pipeline for submitting the tasks received from the neural network application to the neural network accelerator. In one embodiment, the pipeline includes a pre-processing stage, an FPGA execution stage, and a post-processing stage which each correspond to different threads. When receiving a task from the neural network application, the library generates a packet that includes the information required for the different stages in the pipeline to perform the tasks. Because the stages correspond to different threads, the library can process multiple packets in parallel which can increase the utilization of the neural network accelerator on the hardware system.

4.

发明授权
Software-defined buffer/transposer for general matrix multiplication in a programmable IC 有权

公开(公告)号：US11036827B1

公开(公告)日：2021-06-15

申请号：US15786346

申请日：2017-10-17

Applicant: Xilinx, Inc.

Inventor： Jindrich Zejda , Elliott Delaye , Yongjun Wu , Aaron Ng , Ashish Sirasao , Khang K. Dao

IPC: G06F17/16 , G06F7/78 , G06F9/30

Abstract: Methods and apparatus are described for simultaneously buffering and reformatting (e.g., transposing) a matrix for high-speed data streaming in general matrix multiplication (GEMM), which may be implemented by a programmable integrated circuit (IC). Examples of the present disclosure increase the effective double data rate (DDR) memory throughput for streaming data into GEMM digital signal processing (DSP) engine multifold, as well as eliminate slow data reformatting on a host central processing unit (CPU). This may be accomplished through software-defined (e.g., C++) data structures and access patterns that result in hardware logic that simultaneously buffers and reorganizes the data to achieve linear DDR addressing.

5.

发明授权
Inline image preprocessing for convolution operations using a matrix multiplier on an integrated circuit 有权

公开(公告)号：US10984500B1

公开(公告)日：2021-04-20

申请号：US16576365

申请日：2019-09-19

Applicant: Xilinx, Inc.

Inventor： Ashish Sirasao , Elliott Delaye , Aaron Ng , Ehsan Ghasemi

IPC: G06T1/20 , H03K19/1776 , G06F3/03 , G06T1/60 , H04N21/2381 , H04N5/30 , G06F12/00

Abstract: An example preprocessor circuit for formatting image data into a plurality of streams of image samples includes: a plurality of memory banks configured to store the image data; multiplexer circuitry coupled to the memory banks; a first plurality of registers coupled to the multiplexer circuitry; a second plurality of registers coupled to the first plurality of registers, outputs of the second plurality of registers configured to provide the plurality of streams of image samples; bank address and control circuitry coupled to control inputs of the plurality of memory banks, the multiplexer circuitry, and the first plurality of registers; output control circuitry coupled to control inputs of the second plurality of registers; and a control state machine coupled to the bank address and control circuitry and the output control circuitry.

6.

发明授权
Software-driven design optimization for fixed-point multiply-accumulate circuitry 有权

公开(公告)号：US10943039B1

公开(公告)日：2021-03-09

申请号：US15786105

申请日：2017-10-17

Applicant: Xilinx, Inc.

Inventor： Ashish Sirasao , Elliott Delaye , Sean Settle , Zhao Ma , Ehsan Ghasemi , Xiao Teng , Aaron Ng , Jindrich Zejda

IPC: G06F30/327 , G06F7/544 , G06N3/04 , G06F30/34

Abstract: An example multiply accumulate (MACC) circuit includes: a multiply-accumulator having an accumulator output register; a quantizer, coupled to the multiply accumulator; and a control circuit coupled to the multiply-accumulator and the quantizer, the control circuit configured to provide control data to the quantizer, the control data indicative of a most-significant bit (MSB) to least significant bit (LSB) range for selecting bit indices from the accumulator output register.

7.

发明申请
NEURAL NETWORK PROCESSING SYSTEM HAVING HOST CONTROLLED KERNEL ACCLERATORS 审中-公开

公开(公告)号：US20190114535A1

公开(公告)日：2019-04-18

申请号：US15786288

申请日：2017-10-17

Applicant: Xilinx, Inc.

Inventor： Aaron Ng , Jindrich Zejda , Elliott Delaye , Xiao Teng , Ashish Sirasao

IPC: G06N3/063 , G06N3/04

Abstract: A disclosed neural network processing system includes a host computer system, a RAMs coupled to the host computer system, and neural network accelerators coupled to the RAMs, respectively. The host computer system is configured with software that when executed causes the host computer system to write input data and work requests to the RAMS. Each work request specifies a subset of neural network operations to perform and memory locations in a RAM of the input data and parameters. A graph of dependencies among neural network operations is built and additional dependencies added. The operations are partitioned into coarse grain tasks and fine grain subtasks for optimal scheduling for parallel execution. The subtasks are scheduled to accelerator kernels of matching capabilities. Each neural network accelerator is configured to read a work request from the respective RAM and perform the subset of neural network operations on the input data using the parameters.

8.

发明申请
MULTI-LAYER NEURAL NETWORK PROCESSING BY A NEURAL NETWORK ACCELERATOR USING HOST COMMUNICATED MERGED WEIGHTS AND A PACKAGE OF PER-LAYER INSTRUCTIONS 审中-公开

公开(公告)号：US20190114529A1

公开(公告)日：2019-04-18

申请号：US15785800

申请日：2017-10-17

Applicant: Xilinx, Inc.

Inventor： Aaron Ng , Elliott Delaye , Ehsan Ghasemi , Xiao Teng , Jindrich Zejda , Yongjun Wu , Sean Settle , Ashish Sirasao

IPC: G06N3/04

Abstract: In the disclosed methods and systems for processing in a neural network system, a host computer system writes a plurality of weight matrices associated with a plurality of layers of a neural network to a memory shared with a neural network accelerator. The host computer system further assembles a plurality of per-layer instructions into an instruction package. Each per-layer instruction specifies processing of a respective layer of the plurality of layers of the neural network, and respective offsets of weight matrices in a shared memory. The host computer system writes input data and the instruction package to the shared memory. The neural network accelerator reads the instruction package from the shared memory and processes the plurality of per-layer instructions of the instruction package.

9.

发明公开
PROGRAMMABLE NON-LINEAR ACTIVATION ENGINE FOR NEURAL NETWORK ACCELERATION 审中-公开

公开(公告)号：US20230297824A1

公开(公告)日：2023-09-21

申请号：US17655489

申请日：2022-03-18

Applicant: Xilinx, Inc.

Inventor： Rajeev Patwari , Chaithanya Dudha , Jorn Tuyls , Kaushik Barman , Aaron Ng

IPC: G06N3/08

CPC classification number: G06N3/08

Abstract: A programmable, non-linear (PNL) activation engine for a neural network is capable of receiving input data within a circuit. In response to receiving an instruction corresponding to the input data, the PNL activation engine is capable of selecting a first non-linear activation function from a plurality of non-linear activation functions by decoding the instruction. The PNL activation engine is capable of fetching a first set of coefficients corresponding to the first non-linear activation function from a memory. The PNL activation engine is capable of performing a polynomial approximation of the first non-linear activation function on the input data using the first set of coefficients. The PNL activation engine is capable of outputting a result from the polynomial approximation of the first non-linear activation function.

10.

发明授权
Neural network processing system having multiple processors and a neural network accelerator 有权

公开(公告)号：US11222256B2

公开(公告)日：2022-01-11

申请号：US15785685

申请日：2017-10-17

Applicant: Xilinx, Inc.

Inventor： Xiao Teng , Aaron Ng , Ashish Sirasao , Elliott Delaye

IPC: G06N3/063 , G06N3/08 , G06N3/04

Abstract: At least one neural network accelerator performs operations of a first subset of layers of a neural network on an input data set, generates an intermediate data set, and stores the intermediate data set in a shared memory queue in a shared memory. A first processor element of a host computer system provides input data to the neural network accelerator and signals the neural network accelerator to perform the operations of the first subset of layers of the neural network on the input data set. A second processor element of the host computer system reads the intermediate data set from the shared memory queue, performs operations of a second subset of layers of the neural network on the intermediate data set, and generates an output data set while the neural network accelerator is performing the operations of the first subset of layers of the neural network on another input data set.

Search Results

Country/Region

Patent validity

Application date

Publication (announcement) day

applicant

The country/region where the applicant is located

Inventor

IPC

IPC Department

IPC class

IPC subclass

IPC group

IPC team

Appearance classification