Patent search ap:("Xilinx Page Inc.") AND inv:"Jindrich Zejda"

11.

发明授权
Sparse matrix processing circuitry 有权

公开(公告)号：US10572409B1

公开(公告)日：2020-02-25

申请号：US15976722

申请日：2018-05-10

Applicant: Xilinx, Inc.

Inventor： Jindrich Zejda , Ling Liu , Yifei Zhou , Ashish Sirasao

IPC: G06F3/00 , G06F5/00 , G06F13/20 , G06N3/08

Abstract: A memory arrangement can store a matrix of matrix data elements specified as index-value pairs that indicate row and column indices and associated values. First split-and-merge circuitry is coupled between the memory arrangement and a first set of FIFO buffers for reading the matrix data elements from the memory arrangement and putting the matrix data elements in the first set of FIFO buffers based on column indices. A pairing circuit is configured to read vector data elements, pair the vector data elements with the matrix data elements, and put the paired matrix and vector data elements in a second set of FIFO buffers based on column indices. Second split-and-merge circuitry is configured to read paired matrix and vector data elements from the second set of FIFO buffers and put the paired matrix and vector data elements in a third set of FIFO buffers based on row indices.

12.

发明申请
IMAGE PREPROCESSING FOR GENERALIZED IMAGE PROCESSING 审中-公开

公开(公告)号：US20190114499A1

公开(公告)日：2019-04-18

申请号：US15786267

申请日：2017-10-17

Applicant: Xilinx, Inc.

Inventor： Elliott Delaye , Ashish Sirasao , Aaron Ng , Yongjun Wu , Jindrich Zejda

IPC: G06K9/00 , G06F3/06 , G06K9/46 , G06T1/60 , G06T1/20

Abstract: An example preprocessor circuit for formatting image data into a plurality of streams of image samples includes: a first buffer configured to store a plurality of rows of the image data and output a row of the plurality of rows; a second buffer, coupled to the first buffer, including a plurality of storage locations to store a respective plurality of image samples of the row output by the first buffer; a plurality of shift registers; an interconnect network including a plurality of connections, each connection coupling a respective one of the plurality of shift registers to more than one of the plurality of storage locations, one or more of the plurality of storage locations being coupled to more than one of the plurality of connections; and a control circuit configured to load the plurality of shift registers with the plurality of image samples based on the plurality of connections and shift the plurality of shift registers to output the plurality of streams of image samples.

13.

发明申请
MACHINE LEARNING RUNTIME LIBRARY FOR NEURAL NETWORK ACCELERATION 审中-公开

公开(公告)号：US20190114533A1

公开(公告)日：2019-04-18

申请号：US15785679

申请日：2017-10-17

Applicant: Xilinx, Inc.

Inventor： Aaron Ng , Jindrich Zejda , Elliott Delaye , Xiao Teng , Sonal Santan , Soren T. Soe , Ashish Sirasao , Ehsan Ghasemi , Sean Settle

IPC: G06N3/063 , G06N3/10 , G06N3/04 , G06N3/08

Abstract: Embodiments herein describe techniques for interfacing a neural network application with a neural network accelerator using a library. The neural network application may execute on a host computing system while the neural network accelerator executes on a massively parallel hardware system, e.g., a FPGA. The library operates a pipeline for submitting the tasks received from the neural network application to the neural network accelerator. In one embodiment, the pipeline includes a pre-processing stage, an FPGA execution stage, and a post-processing stage which each correspond to different threads. When receiving a task from the neural network application, the library generates a packet that includes the information required for the different stages in the pipeline to perform the tasks. Because the stages correspond to different threads, the library can process multiple packets in parallel which can increase the utilization of the neural network accelerator on the hardware system.

14.

发明授权
Representation of complex timing characteristics of startpoint-endpoint pairs in a circuit design 有权

公开(公告)号：US09842187B1

公开(公告)日：2017-12-12

申请号：US15082993

申请日：2016-03-28

Applicant: Xilinx, Inc.

Inventor： Jindrich Zejda , Atul Srinivasan , Ilya K. Ganusov , Walter A. Manaker, Jr. , Benjamin S. Devlin , Satish B. Sivaswamy

IPC: G06F17/50

CPC classification number: G06F17/5081 , G06F17/5054 , G06F2217/84

Abstract: Approaches for processing a circuit design include determining pin slack values for pins of the circuit elements in the circuit design. A processor selects a subset of endpoints based on pin slack values of the endpoints being in a critical slack range and determines startpoints of the circuit design that are in respective critical fanin cones. For each endpoint of the subset, the processor determines an arrival time from each startpoint in the respective critical fanin cone and determines for each startpoint-endpoint pair, a respective set of constraint values as a function of the respective arrival time from the startpoint. The processor generates a graph in the memory circuit from the startpoint-endpoint pairs. First nodes in the graph represent the startpoints and second nodes in the graph represent the endpoints, and values in the respective set of constraint values are associated with edges that connect the nodes.

15.

发明授权
Software defined neural network layer pipelining 有权

公开(公告)号：US12086572B1

公开(公告)日：2024-09-10

申请号：US15786452

申请日：2017-10-17

Applicant: Xilinx, Inc.

Inventor： Yongjun Wu , Jindrich Zejda , Elliott Delaye , Ashish Sirasao

IPC: G06F8/30 , G06F8/41 , G06F12/06 , G06N3/04 , G06N20/00

CPC classification number: G06F8/313 , G06F8/47 , G06F12/0646 , G06N3/04 , G06N20/00

Abstract: Embodiments herein describe techniques for expressing the layers of a neural network in a software model. In one embodiment, the software model includes a class that describes the various functional blocks (e.g., convolution units, max-pooling units, rectified linear units (ReLU), and scaling functions) used to execute the neural network layers. In turn, other classes in the software model can describe the operation of each of the functional blocks. In addition, the software model can include conditional logic for expressing how the data flows between the functional blocks since different layers in the neural network can process the data differently. A compiler can convert the high-level code in the software model (e.g., C++) into a hardware description language (e.g., register transfer level (RTL)) which is used to configure a hardware system to implement a neural network accelerator.

16.

发明授权
Static block scheduling in massively parallel software defined hardware systems 有权

公开(公告)号：US12061990B2

公开(公告)日：2024-08-13

申请号：US15786434

申请日：2017-10-17

Applicant: Xilinx, Inc.

Inventor： Yongjun Wu , Jindrich Zejda , Elliott Delaye , Ashish Sirasao

IPC: G06N3/086 , G06F9/38 , G06N3/045 , G06N3/063 , G06N20/00 , H04W72/54

CPC classification number: G06N3/086 , G06F9/3844 , G06N3/063 , G06N20/00 , H04W72/54 , G06N3/045

Abstract: Embodiments herein describe techniques for static scheduling a neural network implemented in a massively parallel hardware system. The neural network may be scheduled using three different scheduling levels referred to herein as an upper level, an intermediate level, and a lower level. In one embodiment, the upper level includes a hardware or software model of the layers in the neural network that establishes a sequential order of functions that operate concurrently in the hardware system. In the intermediate level, identical processes in the functions defined in the upper level are connected to form a systolic array or mesh and balanced data flow channels are used to minimize latency. In the lower level, a compiler can assign the operations performed by the processing elements in the systolic array to different portions of the hardware system to provide a static schedule for the neural network.

17.

发明授权
Machine learning runtime library for neural network acceleration 有权

公开(公告)号：US11694066B2

公开(公告)日：2023-07-04

申请号：US15785679

申请日：2017-10-17

Applicant: Xilinx, Inc.

Inventor： Aaron Ng , Jindrich Zejda , Elliott Delaye , Xiao Teng , Sonal Santan , Soren T. Soe , Ashish Sirasao , Ehsan Ghasemi , Sean Settle

IPC: G06N3/063 , G06N3/10 , G06N3/08 , G06N3/04 , G06V10/94 , G06N3/045

CPC classification number: G06N3/063 , G06N3/04 , G06N3/08 , G06N3/10 , G06N3/045 , G06V10/955

Abstract: Embodiments herein describe techniques for interfacing a neural network application with a neural network accelerator using a library. The neural network application may execute on a host computing system while the neural network accelerator executes on a massively parallel hardware system, e.g., a FPGA. The library operates a pipeline for submitting the tasks received from the neural network application to the neural network accelerator. In one embodiment, the pipeline includes a pre-processing stage, an FPGA execution stage, and a post-processing stage which each correspond to different threads. When receiving a task from the neural network application, the library generates a packet that includes the information required for the different stages in the pipeline to perform the tasks. Because the stages correspond to different threads, the library can process multiple packets in parallel which can increase the utilization of the neural network accelerator on the hardware system.

18.

发明授权
Software-defined buffer/transposer for general matrix multiplication in a programmable IC 有权

公开(公告)号：US11036827B1

公开(公告)日：2021-06-15

申请号：US15786346

申请日：2017-10-17

Applicant: Xilinx, Inc.

Inventor： Jindrich Zejda , Elliott Delaye , Yongjun Wu , Aaron Ng , Ashish Sirasao , Khang K. Dao

IPC: G06F17/16 , G06F7/78 , G06F9/30

Abstract: Methods and apparatus are described for simultaneously buffering and reformatting (e.g., transposing) a matrix for high-speed data streaming in general matrix multiplication (GEMM), which may be implemented by a programmable integrated circuit (IC). Examples of the present disclosure increase the effective double data rate (DDR) memory throughput for streaming data into GEMM digital signal processing (DSP) engine multifold, as well as eliminate slow data reformatting on a host central processing unit (CPU). This may be accomplished through software-defined (e.g., C++) data structures and access patterns that result in hardware logic that simultaneously buffers and reorganizes the data to achieve linear DDR addressing.

19.

发明授权
Software-driven design optimization for fixed-point multiply-accumulate circuitry 有权

公开(公告)号：US10943039B1

公开(公告)日：2021-03-09

申请号：US15786105

申请日：2017-10-17

Applicant: Xilinx, Inc.

Inventor： Ashish Sirasao , Elliott Delaye , Sean Settle , Zhao Ma , Ehsan Ghasemi , Xiao Teng , Aaron Ng , Jindrich Zejda

IPC: G06F30/327 , G06F7/544 , G06N3/04 , G06F30/34

Abstract: An example multiply accumulate (MACC) circuit includes: a multiply-accumulator having an accumulator output register; a quantizer, coupled to the multiply accumulator; and a control circuit coupled to the multiply-accumulator and the quantizer, the control circuit configured to provide control data to the quantizer, the control data indicative of a most-significant bit (MSB) to least significant bit (LSB) range for selecting bit indices from the accumulator output register.

20.

发明申请
NEURAL NETWORK PROCESSING SYSTEM HAVING HOST CONTROLLED KERNEL ACCLERATORS 审中-公开

公开(公告)号：US20190114535A1

公开(公告)日：2019-04-18

申请号：US15786288

申请日：2017-10-17

Applicant: Xilinx, Inc.

Inventor： Aaron Ng , Jindrich Zejda , Elliott Delaye , Xiao Teng , Ashish Sirasao

IPC: G06N3/063 , G06N3/04

Abstract: A disclosed neural network processing system includes a host computer system, a RAMs coupled to the host computer system, and neural network accelerators coupled to the RAMs, respectively. The host computer system is configured with software that when executed causes the host computer system to write input data and work requests to the RAMS. Each work request specifies a subset of neural network operations to perform and memory locations in a RAM of the input data and parameters. A graph of dependencies among neural network operations is built and additional dependencies added. The operations are partitioned into coarse grain tasks and fine grain subtasks for optimal scheduling for parallel execution. The subtasks are scheduled to accelerator kernels of matching capabilities. Each neural network accelerator is configured to read a work request from the respective RAM and perform the subset of neural network operations on the input data using the parameters.

Search Results

Country/Region

Patent validity

Application date

Publication (announcement) day

applicant

The country/region where the applicant is located

Inventor

IPC

IPC Department

IPC class

IPC subclass

IPC group

IPC team

Appearance classification