Patent search ap:("Xilinx Page Inc.") AND inv:"Yongjun Wu"

11.

发明授权
Multi-layer neural network processing by a neural network accelerator using host communicated merged weights and a package of per-layer instructions 有权

公开(公告)号：US11620490B2

公开(公告)日：2023-04-04

申请号：US15785800

申请日：2017-10-17

Applicant: Xilinx, Inc.

Inventor： Aaron Ng , Elliott Delaye , Ehsan Ghasemi , Xiao Teng , Jindrich Zejda , Yongjun Wu , Sean Settle , Ashish Sirasao

IPC: G06N3/04 , G06N3/08 , G06N3/063

Abstract: In the disclosed methods and systems for processing in a neural network system, a host computer system writes a plurality of weight matrices associated with a plurality of layers of a neural network to a memory shared with a neural network accelerator. The host computer system further assembles a plurality of per-layer instructions into an instruction package. Each per-layer instruction specifies processing of a respective layer of the plurality of layers of the neural network, and respective offsets of weight matrices in a shared memory. The host computer system writes input data and the instruction package to the shared memory. The neural network accelerator reads the instruction package from the shared memory and processes the plurality of per-layer instructions of the instruction package.

12.

发明授权
Data format suitable for fast massively parallel general matrix multiplication in a programmable IC 有权

公开(公告)号：US10515135B1

公开(公告)日：2019-12-24

申请号：US15785688

申请日：2017-10-17

Applicant: Xilinx, Inc.

Inventor： Jindrich Zejda , Elliott Delaye , Aaron Ng , Ashish Sirasao , Yongjun Wu

IPC: G06F17/16 , G06F7/53 , G06F17/12

Abstract: Methods and apparatus are described for performing data-intensive compute algorithms, such as fast massively parallel general matrix multiplication (GEMM), using a particular data format for both storing data to and reading data from memory. This data format may be utilized for arbitrarily-sized input matrices for GEMM implemented on a finite-size GEMM accelerator in the form of a rectangular compute array of digital signal processing (DSP) elements or similar compute cores. This data format solves the issue of double data rate (DDR) dynamic random access memory (DRAM) bandwidth by allowing both linear DDR addressing and single cycle loading of data into the compute array, avoiding input/output (I/O) and/or DDR bottlenecks.

13.

发明申请
STATIC BLOCK SCHEDULING IN MASSIVELY PARALLEL SOFTWARE DEFINED HARDWARE SYSTEMS 审中-公开

公开(公告)号：US20190114548A1

公开(公告)日：2019-04-18

申请号：US15786434

申请日：2017-10-17

Applicant: Xilinx, Inc.

Inventor： Yongjun Wu , Jindrich Zejda , Elliott Delaye , Ashish Sirasao

IPC: G06N3/08 , G06F15/18 , G06N3/063 , G06F9/38 , H04W72/12

Abstract: Embodiments herein describe techniques for static scheduling a neural network implemented in a massively parallel hardware system. The neural network may be scheduled using three different scheduling levels referred to herein as an upper level, an intermediate level, and a lower level. In one embodiment, the upper level includes a hardware or software model of the layers in the neural network that establishes a sequential order of functions that operate concurrently in the hardware system. In the intermediate level, identical processes in the functions defined in the upper level are connected to form a systolic array or mesh and balanced data flow channels are used to minimize latency. In the lower level, a compiler can assign the operations performed by the processing elements in the systolic array to different portions of the hardware system to provide a static schedule for the neural network.

Patent Agency Ranking