Patent search ap:("Xilinx Page Inc.") AND inv:"Sean Settle"

1.

发明申请
MACHINE LEARNING RUNTIME LIBRARY FOR NEURAL NETWORK ACCELERATION 审中-公开

公开(公告)号：US20190114533A1

公开(公告)日：2019-04-18

申请号：US15785679

申请日：2017-10-17

Applicant: Xilinx, Inc.

Inventor： Aaron Ng , Jindrich Zejda , Elliott Delaye , Xiao Teng , Sonal Santan , Soren T. Soe , Ashish Sirasao , Ehsan Ghasemi , Sean Settle

IPC: G06N3/063 , G06N3/10 , G06N3/04 , G06N3/08

Abstract: Embodiments herein describe techniques for interfacing a neural network application with a neural network accelerator using a library. The neural network application may execute on a host computing system while the neural network accelerator executes on a massively parallel hardware system, e.g., a FPGA. The library operates a pipeline for submitting the tasks received from the neural network application to the neural network accelerator. In one embodiment, the pipeline includes a pre-processing stage, an FPGA execution stage, and a post-processing stage which each correspond to different threads. When receiving a task from the neural network application, the library generates a packet that includes the information required for the different stages in the pipeline to perform the tasks. Because the stages correspond to different threads, the library can process multiple packets in parallel which can increase the utilization of the neural network accelerator on the hardware system.

2.

发明授权
Multi-layer neural network processing by a neural network accelerator using host communicated merged weights and a package of per-layer instructions 有权

公开(公告)号：US11620490B2

公开(公告)日：2023-04-04

申请号：US15785800

申请日：2017-10-17

Applicant: Xilinx, Inc.

Inventor： Aaron Ng , Elliott Delaye , Ehsan Ghasemi , Xiao Teng , Jindrich Zejda , Yongjun Wu , Sean Settle , Ashish Sirasao

IPC: G06N3/04 , G06N3/08 , G06N3/063

Abstract: In the disclosed methods and systems for processing in a neural network system, a host computer system writes a plurality of weight matrices associated with a plurality of layers of a neural network to a memory shared with a neural network accelerator. The host computer system further assembles a plurality of per-layer instructions into an instruction package. Each per-layer instruction specifies processing of a respective layer of the plurality of layers of the neural network, and respective offsets of weight matrices in a shared memory. The host computer system writes input data and the instruction package to the shared memory. The neural network accelerator reads the instruction package from the shared memory and processes the plurality of per-layer instructions of the instruction package.

3.

发明授权
Dynamically structured single instruction, multiple data (SIMD) instructions 有权

公开(公告)号：US10824434B1

公开(公告)日：2020-11-03

申请号：US16204991

申请日：2018-11-29

Applicant: Xilinx, Inc.

Inventor： Sean Settle , Ehsan Ghasemi , Ashish Sirasao , Ralph D. Wittig

IPC: G06F9/38 , G06F9/30

Abstract: Examples described herein relate to dynamically structured single instruction, multiple data (SIMD) instructions, and systems and circuits implementing such dynamically structured SIMD instructions. An example is a method for processing data. A first SIMD structure is determined by a processor. A characteristic of the first SIMD structure is altered by the processor to obtain a second SIMD structure. An indication of the second SIMD structure is communicated from the processor to a numerical engine. Data is packed by the numerical engine into an SIMD instruction according to the second SIMD structure. The SIMD instruction is transmitted from the numerical engine.

4.

发明授权
Software-driven design optimization for mapping between floating-point and fixed-point multiply accumulators 有权

公开(公告)号：US10678509B1

公开(公告)日：2020-06-09

申请号：US16106743

申请日：2018-08-21

Applicant: Xilinx, Inc.

Inventor： Sean Settle , Elliott Delaye , Aaron Ng , Ehsan Ghasemi , Ashish Sirasao , Xiao Teng , Jindrich Zejda

IPC: G06F7/544 , G06F7/533 , G06N3/08 , G06N3/04

Abstract: An example multiply accumulate (MACC) circuit includes a multiply-accumulator having an accumulator output register, a scaler, coupled to the multiply accumulator, and a control circuit coupled to the multiply-accumulator and the scaler. The control circuit is configured to provide control data to the scaler, the control data indicative of: a most-significant bit (MSB) to least significant bit (LSB) range for selecting bit indices from the accumulator output register for implementing a first right shift; a multiplier; and a second right shift.

5.

发明授权
Circuit arrangements and methods for performing multiply-and-accumulate operations 有权

公开(公告)号：US10572225B1

公开(公告)日：2020-02-25

申请号：US16142406

申请日：2018-09-26

Applicant: Xilinx, Inc.

Inventor： Ehsan Ghasemi , Elliott Delaye , Ashish Sirasao , Sean Settle

IPC: G06F7/544 , G06N3/02 , G06T1/60 , G06N3/08

Abstract: A and a request generator circuit is configured to read data elements of a three-dimensional (3-D) input feature map (IFM) from a memory and store a subset of the data elements in one of a plurality of N line buffers. Each line buffer is configured for storage of M data elements. A pixel iterator circuit is coupled to the line buffers and is configured to generate a sequence of addresses for reading the stored data elements from the line buffers based on a sequence of IFM height values and a sequence of IFM width values.

6.

发明授权
Machine learning runtime library for neural network acceleration 有权

公开(公告)号：US11694066B2

公开(公告)日：2023-07-04

申请号：US15785679

申请日：2017-10-17

Applicant: Xilinx, Inc.

Inventor： Aaron Ng , Jindrich Zejda , Elliott Delaye , Xiao Teng , Sonal Santan , Soren T. Soe , Ashish Sirasao , Ehsan Ghasemi , Sean Settle

IPC: G06N3/063 , G06N3/10 , G06N3/08 , G06N3/04 , G06V10/94 , G06N3/045

CPC classification number: G06N3/063 , G06N3/04 , G06N3/08 , G06N3/10 , G06N3/045 , G06V10/955

Abstract: Embodiments herein describe techniques for interfacing a neural network application with a neural network accelerator using a library. The neural network application may execute on a host computing system while the neural network accelerator executes on a massively parallel hardware system, e.g., a FPGA. The library operates a pipeline for submitting the tasks received from the neural network application to the neural network accelerator. In one embodiment, the pipeline includes a pre-processing stage, an FPGA execution stage, and a post-processing stage which each correspond to different threads. When receiving a task from the neural network application, the library generates a packet that includes the information required for the different stages in the pipeline to perform the tasks. Because the stages correspond to different threads, the library can process multiple packets in parallel which can increase the utilization of the neural network accelerator on the hardware system.

7.

发明授权
Software-driven design optimization for fixed-point multiply-accumulate circuitry 有权

公开(公告)号：US10943039B1

公开(公告)日：2021-03-09

申请号：US15786105

申请日：2017-10-17

Applicant: Xilinx, Inc.

Inventor： Ashish Sirasao , Elliott Delaye , Sean Settle , Zhao Ma , Ehsan Ghasemi , Xiao Teng , Aaron Ng , Jindrich Zejda

IPC: G06F30/327 , G06F7/544 , G06N3/04 , G06F30/34

Abstract: An example multiply accumulate (MACC) circuit includes: a multiply-accumulator having an accumulator output register; a quantizer, coupled to the multiply accumulator; and a control circuit coupled to the multiply-accumulator and the quantizer, the control circuit configured to provide control data to the quantizer, the control data indicative of a most-significant bit (MSB) to least significant bit (LSB) range for selecting bit indices from the accumulator output register.

8.

发明申请
MULTI-LAYER NEURAL NETWORK PROCESSING BY A NEURAL NETWORK ACCELERATOR USING HOST COMMUNICATED MERGED WEIGHTS AND A PACKAGE OF PER-LAYER INSTRUCTIONS 审中-公开

公开(公告)号：US20190114529A1

公开(公告)日：2019-04-18

申请号：US15785800

申请日：2017-10-17

Applicant: Xilinx, Inc.

Inventor： Aaron Ng , Elliott Delaye , Ehsan Ghasemi , Xiao Teng , Jindrich Zejda , Yongjun Wu , Sean Settle , Ashish Sirasao

IPC: G06N3/04

Abstract: In the disclosed methods and systems for processing in a neural network system, a host computer system writes a plurality of weight matrices associated with a plurality of layers of a neural network to a memory shared with a neural network accelerator. The host computer system further assembles a plurality of per-layer instructions into an instruction package. Each per-layer instruction specifies processing of a respective layer of the plurality of layers of the neural network, and respective offsets of weight matrices in a shared memory. The host computer system writes input data and the instruction package to the shared memory. The neural network accelerator reads the instruction package from the shared memory and processes the plurality of per-layer instructions of the instruction package.

Search Results

Country/Region

Patent validity

Application date

Publication (announcement) day

applicant

The country/region where the applicant is located

Inventor

IPC

IPC Department

IPC class

IPC subclass

IPC group

IPC team

Appearance classification