Patent search ap:("Xilinx Page Inc.") AND inv:"Ashish Sirasao"

11.

发明申请
IMAGE PREPROCESSING FOR GENERALIZED IMAGE PROCESSING 审中-公开

公开(公告)号：US20190114499A1

公开(公告)日：2019-04-18

申请号：US15786267

申请日：2017-10-17

Applicant: Xilinx, Inc.

Inventor： Elliott Delaye , Ashish Sirasao , Aaron Ng , Yongjun Wu , Jindrich Zejda

IPC: G06K9/00 , G06F3/06 , G06K9/46 , G06T1/60 , G06T1/20

Abstract: An example preprocessor circuit for formatting image data into a plurality of streams of image samples includes: a first buffer configured to store a plurality of rows of the image data and output a row of the plurality of rows; a second buffer, coupled to the first buffer, including a plurality of storage locations to store a respective plurality of image samples of the row output by the first buffer; a plurality of shift registers; an interconnect network including a plurality of connections, each connection coupling a respective one of the plurality of shift registers to more than one of the plurality of storage locations, one or more of the plurality of storage locations being coupled to more than one of the plurality of connections; and a control circuit configured to load the plurality of shift registers with the plurality of image samples based on the plurality of connections and shift the plurality of shift registers to output the plurality of streams of image samples.

12.

发明公开
COMPILATION OF NEURAL NETWORKS INTO SUBGRAPHS FOR PROCESSING BY MULTIPLE COMPUTE CIRCUITS 审中-公开

公开(公告)号：US20230153583A1

公开(公告)日：2023-05-18

申请号：US17454935

申请日：2021-11-15

Applicant: Xilinx, Inc.

Inventor： Ashish Sirasao , Vishal Kumar Jain , Sumit Nagpal

IPC: G06N3/04 , G06N3/10 , G06F8/41

CPC classification number: G06N3/049 , G06F8/41 , G06N3/10 , G06F16/9024

Abstract: Processing of a neural network specification includes gathering first layers of a neural network graph into groups of layers based on profiled compute times of the layers and equalized compute times between the groups. Each group is a subgraph of one or more of the layers of the neural network. The neural network graph is compiled into instructions for pipelined execution of the neural network graph by compute circuits. The compiling includes designating, for each first subgraph of the subgraphs having output activations that are input activations of a second subgraph of the subgraphs, operations of the first subgraph to be performed by a first compute circuit and operations of the second subgraph to be performed by a second compute circuit. The compute circuits are configured to execute the instructions.

13.

发明授权
Multi-layer neural network processing by a neural network accelerator using host communicated merged weights and a package of per-layer instructions 有权

公开(公告)号：US11620490B2

公开(公告)日：2023-04-04

申请号：US15785800

申请日：2017-10-17

Applicant: Xilinx, Inc.

Inventor： Aaron Ng , Elliott Delaye , Ehsan Ghasemi , Xiao Teng , Jindrich Zejda , Yongjun Wu , Sean Settle , Ashish Sirasao

IPC: G06N3/04 , G06N3/08 , G06N3/063

Abstract: In the disclosed methods and systems for processing in a neural network system, a host computer system writes a plurality of weight matrices associated with a plurality of layers of a neural network to a memory shared with a neural network accelerator. The host computer system further assembles a plurality of per-layer instructions into an instruction package. Each per-layer instruction specifies processing of a respective layer of the plurality of layers of the neural network, and respective offsets of weight matrices in a shared memory. The host computer system writes input data and the instruction package to the shared memory. The neural network accelerator reads the instruction package from the shared memory and processes the plurality of per-layer instructions of the instruction package.

14.

发明授权
Host-directed multi-layer neural network processing via per-layer work requests 有权

公开(公告)号：US11429848B2

公开(公告)日：2022-08-30

申请号：US15786102

申请日：2017-10-17

Applicant: Xilinx, Inc.

Inventor： Aaron Ng , Elliott Delaye , Jindrich Zejda , Ashish Sirasao

IPC: G06N3/04 , G06N3/063

Abstract: In disclosed approaches of neural network processing, a host computer system copies an input data matrix from host memory to a shared memory for performing neural network operations of a first layer of a neural network by a neural network accelerator. The host instructs the neural network accelerator to perform neural network operations of each layer of the neural network beginning with the input data matrix. The neural network accelerator performs neural network operations of each layer in response to the instruction from the host. The host waits until the neural network accelerator signals completion of performing neural network operations of layer i before instructing the neural network accelerator to commence performing neural network operations of layer i+1, for i≥1. The host instructs the neural network accelerator to use a results data matrix in the shared memory from layer i as an input data matrix for layer i+1 for i≥1.

15.

发明授权
Sparse matrix processing circuitry 有权

公开(公告)号：US10936311B1

公开(公告)日：2021-03-02

申请号：US16505987

申请日：2019-07-09

Applicant: Xilinx, Inc.

Inventor： Ling Liu , Yifei Zhou , Xiao Teng , Ashish Sirasao , Chuanhua Song , Aaron Ng

IPC: G06F9/30 , G06F17/16

Abstract: Disclosed approaches for multiplying a sparse matrix by dense a vector or matrix include first memory banks for storage of column indices, second memory banks for storage of row indices, and third memory banks for storage of non-zero values of a sparse matrix. A pairing circuit distributes an input stream of vector elements across first first-in-first-out (FIFO) buffers according to the buffered column indices. Multiplication circuitry multiplies vector elements output from the first FIFO buffers by corresponding ones of the non-zero values from the third memory banks, and stores products in second FIFO buffers. Row-aligner circuitry organize the products output from the second FIFO buffers into third FIFO buffers according to row indices in the second memory banks. Accumulation circuitry accumulates respective totals from products output from the third FIFO buffers.

16.

发明申请
MULTIPLY AND ACCUMULATE CIRCUIT 审中-公开

公开(公告)号：US20200089472A1

公开(公告)日：2020-03-19

申请号：US16136041

申请日：2018-09-19

Applicant: Xilinx, Inc.

Inventor： Satyaprakash Pareek , Anup Hosangadi , Bing Tian , Ashish Sirasao , Yao Fu , Oscar Fernando C. Fernandez , Michael Wu , Christopher H. Dick

IPC: G06F7/499 , G06F7/544 , G06F9/38 , G06F9/30

Abstract: Circuits and method for multiplying floating point operands. An exponent adder circuit sums a first exponent and a second exponent and generates an output exponent. A mantissa multiplier circuit multiplies a first mantissa and a second mantissa and generates an output mantissa. A first conversion circuit converts the output exponent and output mantissa into a fixed point number. An accumulator circuit sums contents of an accumulation register and the fixed point number into an accumulated value and stores the accumulated value in the accumulation register.

17.

发明授权
Data format suitable for fast massively parallel general matrix multiplication in a programmable IC 有权

公开(公告)号：US10515135B1

公开(公告)日：2019-12-24

申请号：US15785688

申请日：2017-10-17

Applicant: Xilinx, Inc.

Inventor： Jindrich Zejda , Elliott Delaye , Aaron Ng , Ashish Sirasao , Yongjun Wu

IPC: G06F17/16 , G06F7/53 , G06F17/12

Abstract: Methods and apparatus are described for performing data-intensive compute algorithms, such as fast massively parallel general matrix multiplication (GEMM), using a particular data format for both storing data to and reading data from memory. This data format may be utilized for arbitrarily-sized input matrices for GEMM implemented on a finite-size GEMM accelerator in the form of a rectangular compute array of digital signal processing (DSP) elements or similar compute cores. This data format solves the issue of double data rate (DDR) dynamic random access memory (DRAM) bandwidth by allowing both linear DDR addressing and single cycle loading of data into the compute array, avoiding input/output (I/O) and/or DDR bottlenecks.

18.

发明申请
STATIC BLOCK SCHEDULING IN MASSIVELY PARALLEL SOFTWARE DEFINED HARDWARE SYSTEMS 审中-公开

公开(公告)号：US20190114548A1

公开(公告)日：2019-04-18

申请号：US15786434

申请日：2017-10-17

Applicant: Xilinx, Inc.

Inventor： Yongjun Wu , Jindrich Zejda , Elliott Delaye , Ashish Sirasao

IPC: G06N3/08 , G06F15/18 , G06N3/063 , G06F9/38 , H04W72/12

Abstract: Embodiments herein describe techniques for static scheduling a neural network implemented in a massively parallel hardware system. The neural network may be scheduled using three different scheduling levels referred to herein as an upper level, an intermediate level, and a lower level. In one embodiment, the upper level includes a hardware or software model of the layers in the neural network that establishes a sequential order of functions that operate concurrently in the hardware system. In the intermediate level, identical processes in the functions defined in the upper level are connected to form a systolic array or mesh and balanced data flow channels are used to minimize latency. In the lower level, a compiler can assign the operations performed by the processing elements in the systolic array to different portions of the hardware system to provide a static schedule for the neural network.

19.

发明申请
HOST-DIRECTED MULTI-LAYER NEURAL NETWORK PROCESSING VIA PER-LAYER WORK REQUESTS 审中-公开

公开(公告)号：US20190114538A1

公开(公告)日：2019-04-18

申请号：US15786102

申请日：2017-10-17

Applicant: Xilinx, Inc.

Inventor： Aaron Ng , Elliott Delaye , Jindrich Zejda , Ashish Sirasao

IPC: G06N3/08 , G06F9/28 , H03K19/177 , G06F17/16

Abstract: In disclosed approaches of neural network processing, a host computer system copies an input data matrix from host memory to a shared memory for performing neural network operations of a first layer of a neural network by a neural network accelerator. The host instructs the neural network accelerator to perform neural network operations of each layer of the neural network beginning with the input data matrix. The neural network accelerator performs neural network operations of each layer in response to the instruction from the host. The host waits until the neural network accelerator signals completion of performing neural network operations of layer i before instructing the neural network accelerator to commence performing neural network operations of layer i+1, for i≥1. The host instructs the neural network accelerator to use a results data matrix in the shared memory from layer i as an input data matrix for layer i+1 for i≥1.

20.

发明申请
NEURAL NETWORK PROCESSING SYSTEM HAVING MULTIPLE PROCESSORS AND A NEURAL NETWORK ACCELERATOR 审中-公开

公开(公告)号：US20190114534A1

公开(公告)日：2019-04-18

申请号：US15785685

申请日：2017-10-17

Applicant: Xilinx, Inc.

Inventor： Xiao Teng , Aaron Ng , Ashish Sirasao , Elliott Delaye

IPC: G06N3/063 , G06N3/08 , G06N3/04

Abstract: At least one neural network accelerator performs operations of a first subset of layers of a neural network on an input data set, generates an intermediate data set, and stores the intermediate data set in a shared memory queue in a shared memory. A first processor element of a host computer system provides input data to the neural network accelerator and signals the neural network accelerator to perform the operations of the first subset of layers of the neural network on the input data set. A second processor element of the host computer system reads the intermediate data set from the shared memory queue, performs operations of a second subset of layers of the neural network on the intermediate data set, and generates an output data set while the neural network accelerator is performing the operations of the first subset of layers of the neural network on another input data set.

Search Results

Country/Region

Patent validity

Application date

Publication (announcement) day

applicant

The country/region where the applicant is located

Inventor

IPC

IPC Department

IPC class

IPC subclass

IPC group

IPC team

Appearance classification