Patent search ap:("Xilinx Page Inc.") AND inv:"Jindrich Zejda"

1.

发明授权
Software-driven design optimization for mapping between floating-point and fixed-point multiply accumulators 有权

公开(公告)号：US10678509B1

公开(公告)日：2020-06-09

申请号：US16106743

申请日：2018-08-21

Applicant: Xilinx, Inc.

Inventor： Sean Settle , Elliott Delaye , Aaron Ng , Ehsan Ghasemi , Ashish Sirasao , Xiao Teng , Jindrich Zejda

IPC: G06F7/544 , G06F7/533 , G06N3/08 , G06N3/04

Abstract: An example multiply accumulate (MACC) circuit includes a multiply-accumulator having an accumulator output register, a scaler, coupled to the multiply accumulator, and a control circuit coupled to the multiply-accumulator and the scaler. The control circuit is configured to provide control data to the scaler, the control data indicative of: a most-significant bit (MSB) to least significant bit (LSB) range for selecting bit indices from the accumulator output register for implementing a first right shift; a multiplier; and a second right shift.

2.

发明授权
Software-defined memory bandwidth reduction by hierarchical stream buffering for general matrix multiplication in a programmable IC 有权

公开(公告)号：US10354733B1

公开(公告)日：2019-07-16

申请号：US15786321

申请日：2017-10-17

Applicant: Xilinx, Inc.

Inventor： Jindrich Zejda , Elliott Delaye , Ashish Sirasao , Yongjun Wu , Aaron Ng

IPC: G11C8/00 , G11C16/10 , G06N3/04 , G06F12/06 , G06F13/16 , G06N20/00

Abstract: Methods and apparatus are described for partitioning and reordering block-based matrix multiplications for high-speed data streaming in general matrix multiplication (GEMM), which may be implemented by a programmable integrated circuit (IC). By preloading and hierarchically caching the blocks, examples of the present disclosure reduce the double data rate (DDR) memory intake bandwidth for software-defined GEMM accelerators.

3.

发明授权
Multi-layer neural network processing by a neural network accelerator using host communicated merged weights and a package of per-layer instructions 有权

公开(公告)号：US11620490B2

公开(公告)日：2023-04-04

申请号：US15785800

申请日：2017-10-17

Applicant: Xilinx, Inc.

Inventor： Aaron Ng , Elliott Delaye , Ehsan Ghasemi , Xiao Teng , Jindrich Zejda , Yongjun Wu , Sean Settle , Ashish Sirasao

IPC: G06N3/04 , G06N3/08 , G06N3/063

Abstract: In the disclosed methods and systems for processing in a neural network system, a host computer system writes a plurality of weight matrices associated with a plurality of layers of a neural network to a memory shared with a neural network accelerator. The host computer system further assembles a plurality of per-layer instructions into an instruction package. Each per-layer instruction specifies processing of a respective layer of the plurality of layers of the neural network, and respective offsets of weight matrices in a shared memory. The host computer system writes input data and the instruction package to the shared memory. The neural network accelerator reads the instruction package from the shared memory and processes the plurality of per-layer instructions of the instruction package.

4.

发明授权
Host-directed multi-layer neural network processing via per-layer work requests 有权

公开(公告)号：US11429848B2

公开(公告)日：2022-08-30

申请号：US15786102

申请日：2017-10-17

Applicant: Xilinx, Inc.

Inventor： Aaron Ng , Elliott Delaye , Jindrich Zejda , Ashish Sirasao

IPC: G06N3/04 , G06N3/063

Abstract: In disclosed approaches of neural network processing, a host computer system copies an input data matrix from host memory to a shared memory for performing neural network operations of a first layer of a neural network by a neural network accelerator. The host instructs the neural network accelerator to perform neural network operations of each layer of the neural network beginning with the input data matrix. The neural network accelerator performs neural network operations of each layer in response to the instruction from the host. The host waits until the neural network accelerator signals completion of performing neural network operations of layer i before instructing the neural network accelerator to commence performing neural network operations of layer i+1, for i≥1. The host instructs the neural network accelerator to use a results data matrix in the shared memory from layer i as an input data matrix for layer i+1 for i≥1.

5.

发明授权
Data format suitable for fast massively parallel general matrix multiplication in a programmable IC 有权

公开(公告)号：US10515135B1

公开(公告)日：2019-12-24

申请号：US15785688

申请日：2017-10-17

Applicant: Xilinx, Inc.

Inventor： Jindrich Zejda , Elliott Delaye , Aaron Ng , Ashish Sirasao , Yongjun Wu

IPC: G06F17/16 , G06F7/53 , G06F17/12

Abstract: Methods and apparatus are described for performing data-intensive compute algorithms, such as fast massively parallel general matrix multiplication (GEMM), using a particular data format for both storing data to and reading data from memory. This data format may be utilized for arbitrarily-sized input matrices for GEMM implemented on a finite-size GEMM accelerator in the form of a rectangular compute array of digital signal processing (DSP) elements or similar compute cores. This data format solves the issue of double data rate (DDR) dynamic random access memory (DRAM) bandwidth by allowing both linear DDR addressing and single cycle loading of data into the compute array, avoiding input/output (I/O) and/or DDR bottlenecks.

6.

发明申请
STATIC BLOCK SCHEDULING IN MASSIVELY PARALLEL SOFTWARE DEFINED HARDWARE SYSTEMS 审中-公开

公开(公告)号：US20190114548A1

公开(公告)日：2019-04-18

申请号：US15786434

申请日：2017-10-17

Applicant: Xilinx, Inc.

Inventor： Yongjun Wu , Jindrich Zejda , Elliott Delaye , Ashish Sirasao

IPC: G06N3/08 , G06F15/18 , G06N3/063 , G06F9/38 , H04W72/12

Abstract: Embodiments herein describe techniques for static scheduling a neural network implemented in a massively parallel hardware system. The neural network may be scheduled using three different scheduling levels referred to herein as an upper level, an intermediate level, and a lower level. In one embodiment, the upper level includes a hardware or software model of the layers in the neural network that establishes a sequential order of functions that operate concurrently in the hardware system. In the intermediate level, identical processes in the functions defined in the upper level are connected to form a systolic array or mesh and balanced data flow channels are used to minimize latency. In the lower level, a compiler can assign the operations performed by the processing elements in the systolic array to different portions of the hardware system to provide a static schedule for the neural network.

7.

发明申请
HOST-DIRECTED MULTI-LAYER NEURAL NETWORK PROCESSING VIA PER-LAYER WORK REQUESTS 审中-公开

公开(公告)号：US20190114538A1

公开(公告)日：2019-04-18

申请号：US15786102

申请日：2017-10-17

Applicant: Xilinx, Inc.

Inventor： Aaron Ng , Elliott Delaye , Jindrich Zejda , Ashish Sirasao

IPC: G06N3/08 , G06F9/28 , H03K19/177 , G06F17/16

Abstract: In disclosed approaches of neural network processing, a host computer system copies an input data matrix from host memory to a shared memory for performing neural network operations of a first layer of a neural network by a neural network accelerator. The host instructs the neural network accelerator to perform neural network operations of each layer of the neural network beginning with the input data matrix. The neural network accelerator performs neural network operations of each layer in response to the instruction from the host. The host waits until the neural network accelerator signals completion of performing neural network operations of layer i before instructing the neural network accelerator to commence performing neural network operations of layer i+1, for i≥1. The host instructs the neural network accelerator to use a results data matrix in the shared memory from layer i as an input data matrix for layer i+1 for i≥1.

8.

发明授权
Neural network processing system having host controlled kernel acclerators 有权

公开(公告)号：US11568218B2

公开(公告)日：2023-01-31

申请号：US15786288

申请日：2017-10-17

Applicant: Xilinx, Inc.

Inventor： Aaron Ng , Jindrich Zejda , Elliott Delaye , Xiao Teng , Ashish Sirasao

IPC: G06N3/063 , G06N3/04

Abstract: A disclosed neural network processing system includes a host computer system, a RAMs coupled to the host computer system, and neural network accelerators coupled to the RAMs, respectively. The host computer system is configured with software that when executed causes the host computer system to write input data and work requests to the RAMS. Each work request specifies a subset of neural network operations to perform and memory locations in a RAM of the input data and parameters. A graph of dependencies among neural network operations is built and additional dependencies added. The operations are partitioned into coarse grain tasks and fine grain subtasks for optimal scheduling for parallel execution. The subtasks are scheduled to accelerator kernels of matching capabilities. Each neural network accelerator is configured to read a work request from the respective RAM and perform the subset of neural network operations on the input data using the parameters.

9.

发明授权
Image preprocessing for generalized image processing 有权

公开(公告)号：US11386644B2

公开(公告)日：2022-07-12

申请号：US15786267

申请日：2017-10-17

Applicant: Xilinx, Inc.

Inventor： Elliott Delaye , Ashish Sirasao , Aaron Ng , Yongjun Wu , Jindrich Zejda

IPC: G06V10/94 , G06F3/06 , G06F17/15 , G06N3/04 , G06N3/063 , G06T1/20 , G06T1/60 , G06V10/44

Abstract: An example preprocessor circuit includes: a first buffer configured to store rows of image data and output a row thereof; a second buffer, coupled to the first buffer, including storage locations to store respective image samples of the row output by the first buffer; shift registers; an interconnect network including connections, each connection coupling a respective one of the shift registers to more than one of the storage locations, one or more of the storage locations being coupled to more than one of the connections; and a control circuit configured to load the shift registers with the image samples based on the connections and shift the shift registers to output streams of image samples.

10.

发明授权
Re-targetable interface for data exchange between heterogeneous systems and accelerator abstraction into software instructions 有权

公开(公告)号：US11204747B1

公开(公告)日：2021-12-21

申请号：US15786395

申请日：2017-10-17

Applicant: Xilinx, Inc.

Inventor： Jindrich Zejda , Elliott Delaye , Yongjun Wu , Aaron Ng , Ashish Sirasao , Khang K. Dao , Christopher J. Case

IPC: G06F9/45 , G06F8/41 , G06N3/02 , G06F13/28 , G06F8/30 , G06F9/451 , G06F9/50 , G06F13/362

Abstract: Embodiments herein describe techniques for interfacing a neural network application with a neural network accelerator that operate on two heterogeneous computing systems. For example, the neural network application may execute on a central processing unit (CPU) in a computing system while the neural network accelerator executes on a FPGA. As a result, when moving a software-hardware boundary between the two heterogeneous systems, changes may be made to both the neural network application (using software code) and to the accelerator (using RTL). The embodiments herein describe a software defined approach where shared interface code is used to express both sides of the interface between the two heterogeneous systems in a single abstraction (e.g., a software class).

Search Results

Country/Region

Patent validity

Application date

Publication (announcement) day

applicant

The country/region where the applicant is located

Inventor

IPC

IPC Department

IPC class

IPC subclass

IPC group

IPC team

Appearance classification