Patent search ap:("Xilinx Page Inc.") AND inv:"Ehsan Ghasemi"

1.

发明申请
MACHINE LEARNING RUNTIME LIBRARY FOR NEURAL NETWORK ACCELERATION 审中-公开

公开(公告)号：US20190114533A1

公开(公告)日：2019-04-18

申请号：US15785679

申请日：2017-10-17

Applicant: Xilinx, Inc.

Inventor： Aaron Ng , Jindrich Zejda , Elliott Delaye , Xiao Teng , Sonal Santan , Soren T. Soe , Ashish Sirasao , Ehsan Ghasemi , Sean Settle

IPC: G06N3/063 , G06N3/10 , G06N3/04 , G06N3/08

Abstract: Embodiments herein describe techniques for interfacing a neural network application with a neural network accelerator using a library. The neural network application may execute on a host computing system while the neural network accelerator executes on a massively parallel hardware system, e.g., a FPGA. The library operates a pipeline for submitting the tasks received from the neural network application to the neural network accelerator. In one embodiment, the pipeline includes a pre-processing stage, an FPGA execution stage, and a post-processing stage which each correspond to different threads. When receiving a task from the neural network application, the library generates a packet that includes the information required for the different stages in the pipeline to perform the tasks. Because the stages correspond to different threads, the library can process multiple packets in parallel which can increase the utilization of the neural network accelerator on the hardware system.

2.

发明授权
Reconfigurable neural engine with extensible instruction set architecture 有权

公开(公告)号：US12079158B2

公开(公告)日：2024-09-03

申请号：US17814817

申请日：2022-07-25

Applicant: Xilinx, Inc.

Inventor： Sanket Pandit , Jorn Tuyls , Xiao Teng , Rajeev Patwari , Ehsan Ghasemi , Elliott Delaye , Aaron Ng

IPC: G06F15/76 , G06F9/455 , G06F15/80

CPC classification number: G06F15/8053 , G06F9/45533

Abstract: An integrated circuit includes a plurality of kernels and a virtual machine coupled to the plurality of kernels. The virtual machine is configured to interpret instructions directed to different ones of the plurality of kernels. The virtual machine is configured to control operation of the different ones of the plurality of kernels responsive to the instructions.

3.

发明公开
INSTRUCTION SET ARCHITECTURE FOR DATA PROCESSING ARRAY CONTROL 审中-公开

公开(公告)号：US20240045692A1

公开(公告)日：2024-02-08

申请号：US17818309

申请日：2022-08-08

Applicant: Xilinx, Inc.

Inventor： Xiao Teng , Tejus Siddagangaiah , Bryan Lozano , Ehsan Ghasemi , Rajeev Patwari , Elliott Delaye , Jorn Tuyls , Aaron Ng , Sanket Pandit , Pramod Peethambaran , Satyaprakash Pareek

IPC: G06F9/38 , G06F9/46 , G06F9/30

CPC classification number: G06F9/3814 , G06F9/467 , G06F9/3004

Abstract: Controlling a data processing (DP) array includes creating a replica of a register address space of the DP array based on the design and the DP array. A sequence of instructions, including write instructions and read instructions, is received. The write instructions correspond to buffer descriptors specifying runtime data movements for a design for a DP array. The write instructions are converted into transaction instructions and the read instructions are converted into wait instructions based on the replica of the register address space. The transaction instructions and the wait instructions are included in an instruction buffer. The instruction buffer is provided to a microcontroller configured to execute the transaction instructions and the wait instructions to implement the runtime data movements for the design as implemented in the DP array. In another aspect, the instruction buffer is stored in a file for subsequent execution by the microcontroller.

4.

发明公开
HARDWARE ACCELERATION OF MACHINE LEARNING DESIGNS 审中-公开

公开(公告)号：US20230401480A1

公开(公告)日：2023-12-14

申请号：US17806906

申请日：2022-06-14

Applicant: Xilinx, Inc.

Inventor： Ehsan Ghasemi , Rajeev Patwari , Elliott Delaye , Jorn Tuyls , Ephrem C. Wu , Xiao Teng , Sanket Pandit

IPC: G06N20/00

CPC classification number: G06N20/00

Abstract: Hardware acceleration of machine learning (ML) designs includes translating an ML primitive into an intermediate representation. The intermediate representation is subdivided to specify a functional compute block. The functional compute block is sized according to a compute node primitive adapted for implementing the ML primitive on target hardware. An overlay is generated for the ML primitive, at least in part, by mapping the functional compute block to the compute node primitive. The overlay is synthesizable to implement the ML primitive on the target hardware. The overlay can be scheduled for operation within the target hardware as part of an ML design including the ML primitive.

5.

发明授权
Circuit arrangements and methods for traversing input feature maps 有权

公开(公告)号：US11106968B1

公开(公告)日：2021-08-31

申请号：US15989075

申请日：2018-05-24

Applicant: Xilinx, Inc.

Inventor： Ehsan Ghasemi , Elliott Delaye , Ashish Sirasao

IPC: G06N3/04

Abstract: A circuit arrangement includes a buffer, a height traversal circuit configured to generate a sequence of IFM height values in response to first control signals, a width traversal circuit configured to generate a sequence of IFM width values in response to second control signals, a control circuit, and an address generation circuit. The control circuit is configured to input an OFM height, an OFM width, a kernel height, and a kernel width; generate the first control signals at times based on the OFM height and the kernel height; and generate the second control signals at times based on the OFM width and the kernel width. The address generation circuit is configured to generate a sequence of addresses based on the sequences of IFM height values and IFM width values, provide the sequence of addresses to the buffer, and enable reading from the buffer.

6.

发明授权
Machine learning runtime library for neural network acceleration 有权

公开(公告)号：US11694066B2

公开(公告)日：2023-07-04

申请号：US15785679

申请日：2017-10-17

Applicant: Xilinx, Inc.

Inventor： Aaron Ng , Jindrich Zejda , Elliott Delaye , Xiao Teng , Sonal Santan , Soren T. Soe , Ashish Sirasao , Ehsan Ghasemi , Sean Settle

IPC: G06N3/063 , G06N3/10 , G06N3/08 , G06N3/04 , G06V10/94 , G06N3/045

CPC classification number: G06N3/063 , G06N3/04 , G06N3/08 , G06N3/10 , G06N3/045 , G06V10/955

Abstract: Embodiments herein describe techniques for interfacing a neural network application with a neural network accelerator using a library. The neural network application may execute on a host computing system while the neural network accelerator executes on a massively parallel hardware system, e.g., a FPGA. The library operates a pipeline for submitting the tasks received from the neural network application to the neural network accelerator. In one embodiment, the pipeline includes a pre-processing stage, an FPGA execution stage, and a post-processing stage which each correspond to different threads. When receiving a task from the neural network application, the library generates a packet that includes the information required for the different stages in the pipeline to perform the tasks. Because the stages correspond to different threads, the library can process multiple packets in parallel which can increase the utilization of the neural network accelerator on the hardware system.

7.

发明授权
Inline image preprocessing for convolution operations using a matrix multiplier on an integrated circuit 有权

公开(公告)号：US10984500B1

公开(公告)日：2021-04-20

申请号：US16576365

申请日：2019-09-19

Applicant: Xilinx, Inc.

Inventor： Ashish Sirasao , Elliott Delaye , Aaron Ng , Ehsan Ghasemi

IPC: G06T1/20 , H03K19/1776 , G06F3/03 , G06T1/60 , H04N21/2381 , H04N5/30 , G06F12/00

Abstract: An example preprocessor circuit for formatting image data into a plurality of streams of image samples includes: a plurality of memory banks configured to store the image data; multiplexer circuitry coupled to the memory banks; a first plurality of registers coupled to the multiplexer circuitry; a second plurality of registers coupled to the first plurality of registers, outputs of the second plurality of registers configured to provide the plurality of streams of image samples; bank address and control circuitry coupled to control inputs of the plurality of memory banks, the multiplexer circuitry, and the first plurality of registers; output control circuitry coupled to control inputs of the second plurality of registers; and a control state machine coupled to the bank address and control circuitry and the output control circuitry.

8.

发明授权
Software-driven design optimization for fixed-point multiply-accumulate circuitry 有权

公开(公告)号：US10943039B1

公开(公告)日：2021-03-09

申请号：US15786105

申请日：2017-10-17

Applicant: Xilinx, Inc.

Inventor： Ashish Sirasao , Elliott Delaye , Sean Settle , Zhao Ma , Ehsan Ghasemi , Xiao Teng , Aaron Ng , Jindrich Zejda

IPC: G06F30/327 , G06F7/544 , G06N3/04 , G06F30/34

Abstract: An example multiply accumulate (MACC) circuit includes: a multiply-accumulator having an accumulator output register; a quantizer, coupled to the multiply accumulator; and a control circuit coupled to the multiply-accumulator and the quantizer, the control circuit configured to provide control data to the quantizer, the control data indicative of a most-significant bit (MSB) to least significant bit (LSB) range for selecting bit indices from the accumulator output register.

9.

发明申请
MULTI-LAYER NEURAL NETWORK PROCESSING BY A NEURAL NETWORK ACCELERATOR USING HOST COMMUNICATED MERGED WEIGHTS AND A PACKAGE OF PER-LAYER INSTRUCTIONS 审中-公开

公开(公告)号：US20190114529A1

公开(公告)日：2019-04-18

申请号：US15785800

申请日：2017-10-17

Applicant: Xilinx, Inc.

Inventor： Aaron Ng , Elliott Delaye , Ehsan Ghasemi , Xiao Teng , Jindrich Zejda , Yongjun Wu , Sean Settle , Ashish Sirasao

IPC: G06N3/04

Abstract: In the disclosed methods and systems for processing in a neural network system, a host computer system writes a plurality of weight matrices associated with a plurality of layers of a neural network to a memory shared with a neural network accelerator. The host computer system further assembles a plurality of per-layer instructions into an instruction package. Each per-layer instruction specifies processing of a respective layer of the plurality of layers of the neural network, and respective offsets of weight matrices in a shared memory. The host computer system writes input data and the instruction package to the shared memory. The neural network accelerator reads the instruction package from the shared memory and processes the plurality of per-layer instructions of the instruction package.

10.

发明授权
Instruction set architecture for data processing array control 有权

公开(公告)号：US12248786B2

公开(公告)日：2025-03-11

申请号：US17818309

申请日：2022-08-08

Applicant: Xilinx, Inc.

Inventor： Xiao Teng , Tejus Siddagangaiah , Bryan Lozano , Ehsan Ghasemi , Rajeev Patwari , Elliott Delaye , Jorn Tuyls , Aaron Ng , Sanket Pandit , Pramod Peethambaran , Satyaprakash Pareek

IPC: G06F9/30 , G06F9/38 , G06F9/46

Abstract: Controlling a data processing (DP) array includes creating a replica of a register address space of the DP array based on the design and the DP array. A sequence of instructions, including write instructions and read instructions, is received. The write instructions correspond to buffer descriptors specifying runtime data movements for a design for a DP array. The write instructions are converted into transaction instructions and the read instructions are converted into wait instructions based on the replica of the register address space. The transaction instructions and the wait instructions are included in an instruction buffer. The instruction buffer is provided to a microcontroller configured to execute the transaction instructions and the wait instructions to implement the runtime data movements for the design as implemented in the DP array. In another aspect, the instruction buffer is stored in a file for subsequent execution by the microcontroller.

Search Results

Country/Region

Patent validity

Application date

Publication (announcement) day

applicant

The country/region where the applicant is located

Inventor

IPC

IPC Department

IPC class

IPC subclass

IPC group

IPC team

Appearance classification