Patent search ap:("Xilinx Page Inc.") AND inv:"Yongjun Wu"

1.

发明授权
Software-defined memory bandwidth reduction by hierarchical stream buffering for general matrix multiplication in a programmable IC 有权

公开(公告)号：US10354733B1

公开(公告)日：2019-07-16

申请号：US15786321

申请日：2017-10-17

Applicant: Xilinx, Inc.

Inventor： Jindrich Zejda , Elliott Delaye , Ashish Sirasao , Yongjun Wu , Aaron Ng

IPC: G11C8/00 , G11C16/10 , G06N3/04 , G06F12/06 , G06F13/16 , G06N20/00

Abstract: Methods and apparatus are described for partitioning and reordering block-based matrix multiplications for high-speed data streaming in general matrix multiplication (GEMM), which may be implemented by a programmable integrated circuit (IC). By preloading and hierarchically caching the blocks, examples of the present disclosure reduce the double data rate (DDR) memory intake bandwidth for software-defined GEMM accelerators.

2.

发明授权
Parallel compute offload to database accelerator 有权

公开(公告)号：US12105716B2

公开(公告)日：2024-10-01

申请号：US15632082

申请日：2017-06-23

Applicant: Xilinx, Inc.

Inventor： Hare K. Verma , Sonal Santan , Yongjun Wu

IPC: G06F16/00 , G06F3/06 , G06F9/38 , G06F16/245 , G06F16/2453 , G06F16/2455

CPC classification number: G06F16/24557 , G06F3/064 , G06F9/3885 , G06F16/2453 , G06F16/24532 , G06F16/24569

Abstract: Embodiments herein describe techniques for preparing and executing tasks related to a database query in a database accelerator. In one embodiment, the database accelerator is separate from a host CPU. A database management system (DBMS) can offload tasks corresponding to a database query to the database accelerator. The DBMS can request data from the database relevant to the query and then convert that data into one or more data blocks that are suitable for processing by the database accelerator. In one embodiment, the database accelerator contains individual hardware processing units (PUs) that can process data in parallel or concurrently. In order to process the data concurrently, the data block includes individual PU data blocks that are each intended for a respective PU in the database accelerator.

3.

发明申请
PARALLEL COMPUTE OFFLOAD TO DATABASE ACCELERATOR 审中-公开

公开(公告)号：US20180373760A1

公开(公告)日：2018-12-27

申请号：US15632082

申请日：2017-06-23

Applicant: Xilinx, Inc.

Inventor： Hare K. Verma , Sonal Santan , Yongjun Wu

IPC: G06F17/30 , G06F3/06 , G06F9/38

Abstract: Embodiments herein describe techniques for preparing and executing tasks related to a database query in a database accelerator. In one embodiment, the database accelerator is separate from a host CPU. A database management system (DBMS) can offload tasks corresponding to a database query to the database accelerator. The DBMS can request data from the database relevant to the query and then convert that data into one or more data blocks that are suitable for processing by the database accelerator. In one embodiment, the database accelerator contains individual hardware processing units (PUs) that can process data in parallel or concurrently. In order to process the data concurrently, the data block includes individual PU data blocks that are each intended for a respective PU in the database accelerator.

4.

发明授权
Image preprocessing for generalized image processing 有权

公开(公告)号：US11386644B2

公开(公告)日：2022-07-12

申请号：US15786267

申请日：2017-10-17

Applicant: Xilinx, Inc.

Inventor： Elliott Delaye , Ashish Sirasao , Aaron Ng , Yongjun Wu , Jindrich Zejda

IPC: G06V10/94 , G06F3/06 , G06F17/15 , G06N3/04 , G06N3/063 , G06T1/20 , G06T1/60 , G06V10/44

Abstract: An example preprocessor circuit includes: a first buffer configured to store rows of image data and output a row thereof; a second buffer, coupled to the first buffer, including storage locations to store respective image samples of the row output by the first buffer; shift registers; an interconnect network including connections, each connection coupling a respective one of the shift registers to more than one of the storage locations, one or more of the storage locations being coupled to more than one of the connections; and a control circuit configured to load the shift registers with the image samples based on the connections and shift the shift registers to output streams of image samples.

5.

发明授权
Re-targetable interface for data exchange between heterogeneous systems and accelerator abstraction into software instructions 有权

公开(公告)号：US11204747B1

公开(公告)日：2021-12-21

申请号：US15786395

申请日：2017-10-17

Applicant: Xilinx, Inc.

Inventor： Jindrich Zejda , Elliott Delaye , Yongjun Wu , Aaron Ng , Ashish Sirasao , Khang K. Dao , Christopher J. Case

IPC: G06F9/45 , G06F8/41 , G06N3/02 , G06F13/28 , G06F8/30 , G06F9/451 , G06F9/50 , G06F13/362

Abstract: Embodiments herein describe techniques for interfacing a neural network application with a neural network accelerator that operate on two heterogeneous computing systems. For example, the neural network application may execute on a central processing unit (CPU) in a computing system while the neural network accelerator executes on a FPGA. As a result, when moving a software-hardware boundary between the two heterogeneous systems, changes may be made to both the neural network application (using software code) and to the accelerator (using RTL). The embodiments herein describe a software defined approach where shared interface code is used to express both sides of the interface between the two heterogeneous systems in a single abstraction (e.g., a software class).

6.

发明申请
IMAGE PREPROCESSING FOR GENERALIZED IMAGE PROCESSING 审中-公开

公开(公告)号：US20190114499A1

公开(公告)日：2019-04-18

申请号：US15786267

申请日：2017-10-17

Applicant: Xilinx, Inc.

Inventor： Elliott Delaye , Ashish Sirasao , Aaron Ng , Yongjun Wu , Jindrich Zejda

IPC: G06K9/00 , G06F3/06 , G06K9/46 , G06T1/60 , G06T1/20

Abstract: An example preprocessor circuit for formatting image data into a plurality of streams of image samples includes: a first buffer configured to store a plurality of rows of the image data and output a row of the plurality of rows; a second buffer, coupled to the first buffer, including a plurality of storage locations to store a respective plurality of image samples of the row output by the first buffer; a plurality of shift registers; an interconnect network including a plurality of connections, each connection coupling a respective one of the plurality of shift registers to more than one of the plurality of storage locations, one or more of the plurality of storage locations being coupled to more than one of the plurality of connections; and a control circuit configured to load the plurality of shift registers with the plurality of image samples based on the plurality of connections and shift the plurality of shift registers to output the plurality of streams of image samples.

7.

发明授权
Software defined neural network layer pipelining 有权

公开(公告)号：US12086572B1

公开(公告)日：2024-09-10

申请号：US15786452

申请日：2017-10-17

Applicant: Xilinx, Inc.

Inventor： Yongjun Wu , Jindrich Zejda , Elliott Delaye , Ashish Sirasao

IPC: G06F8/30 , G06F8/41 , G06F12/06 , G06N3/04 , G06N20/00

CPC classification number: G06F8/313 , G06F8/47 , G06F12/0646 , G06N3/04 , G06N20/00

Abstract: Embodiments herein describe techniques for expressing the layers of a neural network in a software model. In one embodiment, the software model includes a class that describes the various functional blocks (e.g., convolution units, max-pooling units, rectified linear units (ReLU), and scaling functions) used to execute the neural network layers. In turn, other classes in the software model can describe the operation of each of the functional blocks. In addition, the software model can include conditional logic for expressing how the data flows between the functional blocks since different layers in the neural network can process the data differently. A compiler can convert the high-level code in the software model (e.g., C++) into a hardware description language (e.g., register transfer level (RTL)) which is used to configure a hardware system to implement a neural network accelerator.

8.

发明授权
Static block scheduling in massively parallel software defined hardware systems 有权

公开(公告)号：US12061990B2

公开(公告)日：2024-08-13

申请号：US15786434

申请日：2017-10-17

Applicant: Xilinx, Inc.

Inventor： Yongjun Wu , Jindrich Zejda , Elliott Delaye , Ashish Sirasao

IPC: G06N3/086 , G06F9/38 , G06N3/045 , G06N3/063 , G06N20/00 , H04W72/54

CPC classification number: G06N3/086 , G06F9/3844 , G06N3/063 , G06N20/00 , H04W72/54 , G06N3/045

Abstract: Embodiments herein describe techniques for static scheduling a neural network implemented in a massively parallel hardware system. The neural network may be scheduled using three different scheduling levels referred to herein as an upper level, an intermediate level, and a lower level. In one embodiment, the upper level includes a hardware or software model of the layers in the neural network that establishes a sequential order of functions that operate concurrently in the hardware system. In the intermediate level, identical processes in the functions defined in the upper level are connected to form a systolic array or mesh and balanced data flow channels are used to minimize latency. In the lower level, a compiler can assign the operations performed by the processing elements in the systolic array to different portions of the hardware system to provide a static schedule for the neural network.

9.

发明授权
Software-defined buffer/transposer for general matrix multiplication in a programmable IC 有权

公开(公告)号：US11036827B1

公开(公告)日：2021-06-15

申请号：US15786346

申请日：2017-10-17

Applicant: Xilinx, Inc.

Inventor： Jindrich Zejda , Elliott Delaye , Yongjun Wu , Aaron Ng , Ashish Sirasao , Khang K. Dao

IPC: G06F17/16 , G06F7/78 , G06F9/30

Abstract: Methods and apparatus are described for simultaneously buffering and reformatting (e.g., transposing) a matrix for high-speed data streaming in general matrix multiplication (GEMM), which may be implemented by a programmable integrated circuit (IC). Examples of the present disclosure increase the effective double data rate (DDR) memory throughput for streaming data into GEMM digital signal processing (DSP) engine multifold, as well as eliminate slow data reformatting on a host central processing unit (CPU). This may be accomplished through software-defined (e.g., C++) data structures and access patterns that result in hardware logic that simultaneously buffers and reorganizes the data to achieve linear DDR addressing.

10.

发明申请
MULTI-LAYER NEURAL NETWORK PROCESSING BY A NEURAL NETWORK ACCELERATOR USING HOST COMMUNICATED MERGED WEIGHTS AND A PACKAGE OF PER-LAYER INSTRUCTIONS 审中-公开

公开(公告)号：US20190114529A1

公开(公告)日：2019-04-18

申请号：US15785800

申请日：2017-10-17

Applicant: Xilinx, Inc.

Inventor： Aaron Ng , Elliott Delaye , Ehsan Ghasemi , Xiao Teng , Jindrich Zejda , Yongjun Wu , Sean Settle , Ashish Sirasao

IPC: G06N3/04

Abstract: In the disclosed methods and systems for processing in a neural network system, a host computer system writes a plurality of weight matrices associated with a plurality of layers of a neural network to a memory shared with a neural network accelerator. The host computer system further assembles a plurality of per-layer instructions into an instruction package. Each per-layer instruction specifies processing of a respective layer of the plurality of layers of the neural network, and respective offsets of weight matrices in a shared memory. The host computer system writes input data and the instruction package to the shared memory. The neural network accelerator reads the instruction package from the shared memory and processes the plurality of per-layer instructions of the instruction package.

Search Results

Country/Region

Patent validity

Application date

Publication (announcement) day

applicant

The country/region where the applicant is located

Inventor

IPC

IPC Department

IPC class

IPC subclass

IPC group

IPC team

Appearance classification