Parallel compute offload to database accelerator

    公开(公告)号:US12105716B2

    公开(公告)日:2024-10-01

    申请号:US15632082

    申请日:2017-06-23

    Applicant: Xilinx, Inc.

    Abstract: Embodiments herein describe techniques for preparing and executing tasks related to a database query in a database accelerator. In one embodiment, the database accelerator is separate from a host CPU. A database management system (DBMS) can offload tasks corresponding to a database query to the database accelerator. The DBMS can request data from the database relevant to the query and then convert that data into one or more data blocks that are suitable for processing by the database accelerator. In one embodiment, the database accelerator contains individual hardware processing units (PUs) that can process data in parallel or concurrently. In order to process the data concurrently, the data block includes individual PU data blocks that are each intended for a respective PU in the database accelerator.

    PARALLEL COMPUTE OFFLOAD TO DATABASE ACCELERATOR

    公开(公告)号:US20180373760A1

    公开(公告)日:2018-12-27

    申请号:US15632082

    申请日:2017-06-23

    Applicant: Xilinx, Inc.

    Abstract: Embodiments herein describe techniques for preparing and executing tasks related to a database query in a database accelerator. In one embodiment, the database accelerator is separate from a host CPU. A database management system (DBMS) can offload tasks corresponding to a database query to the database accelerator. The DBMS can request data from the database relevant to the query and then convert that data into one or more data blocks that are suitable for processing by the database accelerator. In one embodiment, the database accelerator contains individual hardware processing units (PUs) that can process data in parallel or concurrently. In order to process the data concurrently, the data block includes individual PU data blocks that are each intended for a respective PU in the database accelerator.

    Image preprocessing for generalized image processing

    公开(公告)号:US11386644B2

    公开(公告)日:2022-07-12

    申请号:US15786267

    申请日:2017-10-17

    Applicant: Xilinx, Inc.

    Abstract: An example preprocessor circuit includes: a first buffer configured to store rows of image data and output a row thereof; a second buffer, coupled to the first buffer, including storage locations to store respective image samples of the row output by the first buffer; shift registers; an interconnect network including connections, each connection coupling a respective one of the shift registers to more than one of the storage locations, one or more of the storage locations being coupled to more than one of the connections; and a control circuit configured to load the shift registers with the image samples based on the connections and shift the shift registers to output streams of image samples.

    IMAGE PREPROCESSING FOR GENERALIZED IMAGE PROCESSING

    公开(公告)号:US20190114499A1

    公开(公告)日:2019-04-18

    申请号:US15786267

    申请日:2017-10-17

    Applicant: Xilinx, Inc.

    Abstract: An example preprocessor circuit for formatting image data into a plurality of streams of image samples includes: a first buffer configured to store a plurality of rows of the image data and output a row of the plurality of rows; a second buffer, coupled to the first buffer, including a plurality of storage locations to store a respective plurality of image samples of the row output by the first buffer; a plurality of shift registers; an interconnect network including a plurality of connections, each connection coupling a respective one of the plurality of shift registers to more than one of the plurality of storage locations, one or more of the plurality of storage locations being coupled to more than one of the plurality of connections; and a control circuit configured to load the plurality of shift registers with the plurality of image samples based on the plurality of connections and shift the plurality of shift registers to output the plurality of streams of image samples.

    Software defined neural network layer pipelining

    公开(公告)号:US12086572B1

    公开(公告)日:2024-09-10

    申请号:US15786452

    申请日:2017-10-17

    Applicant: Xilinx, Inc.

    CPC classification number: G06F8/313 G06F8/47 G06F12/0646 G06N3/04 G06N20/00

    Abstract: Embodiments herein describe techniques for expressing the layers of a neural network in a software model. In one embodiment, the software model includes a class that describes the various functional blocks (e.g., convolution units, max-pooling units, rectified linear units (ReLU), and scaling functions) used to execute the neural network layers. In turn, other classes in the software model can describe the operation of each of the functional blocks. In addition, the software model can include conditional logic for expressing how the data flows between the functional blocks since different layers in the neural network can process the data differently. A compiler can convert the high-level code in the software model (e.g., C++) into a hardware description language (e.g., register transfer level (RTL)) which is used to configure a hardware system to implement a neural network accelerator.

    Software-defined buffer/transposer for general matrix multiplication in a programmable IC

    公开(公告)号:US11036827B1

    公开(公告)日:2021-06-15

    申请号:US15786346

    申请日:2017-10-17

    Applicant: Xilinx, Inc.

    Abstract: Methods and apparatus are described for simultaneously buffering and reformatting (e.g., transposing) a matrix for high-speed data streaming in general matrix multiplication (GEMM), which may be implemented by a programmable integrated circuit (IC). Examples of the present disclosure increase the effective double data rate (DDR) memory throughput for streaming data into GEMM digital signal processing (DSP) engine multifold, as well as eliminate slow data reformatting on a host central processing unit (CPU). This may be accomplished through software-defined (e.g., C++) data structures and access patterns that result in hardware logic that simultaneously buffers and reorganizes the data to achieve linear DDR addressing.

Patent Agency Ranking