-
公开(公告)号:US20230058749A1
公开(公告)日:2023-02-23
申请号:US17867625
申请日:2022-07-18
Applicant: XILINX, INC.
Inventor: Stephan MUNZ , Francisco Barat QUESADA , Baris OZGUL , Javier CABEZAS RODRIGUEZ , Zachary DICKMAN , Pedro Miguel Parola DUARTE , Dylan STUART , Juan J. NOGUERA SERRA
IPC: G06F17/16 , G06F15/80 , H03K19/173
Abstract: Examples herein describe techniques for adapting a multiplier array (e.g., a systolic array implemented in a processing core) to perform different dot products. The processing core can include data selection logic that enables different configurations of the multiplier array in the core. For example, the data selection logic can enable different configurations of the multiplier array while using the same underlying hardware. That is, the multiplier array is fixed hardware but the data selection can transmit data into the matrix multiplier such that it is configured to perform different length dot products, perform more dot products in parallel, or change its output precision. In this manner, the same underlying hardware (i.e., the multiplier array) can be reconfigured for different dot products which can result in much more efficient use of the hardware.
-
公开(公告)号:US20230059970A1
公开(公告)日:2023-02-23
申请号:US17867630
申请日:2022-07-18
Applicant: XILINX, INC.
Inventor: Francisco Barat QUESADA , Baris OZGUL , Dylan STUART , Stephan MUNZ , Zachary DICKMAN , Javier CABEZAS RODRIGUEZ , David Patrick CLARKE , Pedro Miguel Parola DUARTE , Peter MCCOLGAN , Juan J. NOGUERA SERRA
IPC: G06N20/00
Abstract: Examples herein describe techniques for reducing the amount of memory used during weight sparsity. When decompressing the weights, the uncompressed weight data typically has many zero values. By knowing the location of these zero values (e.g., their indices in a weight matrix), the processor core can prune some of the activations (e.g., logically reduce the size of the activation matrix) which improves the efficiency of the processor core. In embodiments herein, the processor core includes logic for identifying the indices of the non-zero value after decompressing the compressed weights. These indices can then be used to prune the activations to improve the efficiency of the processor core.
-
公开(公告)号:US20240256482A1
公开(公告)日:2024-08-01
申请号:US18633398
申请日:2024-04-11
Applicant: XILINX, INC.
Inventor: Juan J. NOGUERA SERRA , Sneha Bhalchandra DATE , Jan LANGER , Baris OZGUL , Goran Hk BILSKI
IPC: G06F15/177 , G06F9/4401 , G06F15/173 , G06F15/80
CPC classification number: G06F15/177 , G06F15/17306 , G06F15/80 , G06F9/4401 , G06F9/4411
Abstract: A device may include a processor system and an array of data processing engines (DPEs) communicatively coupled to the processor system. Each of the DPEs includes a core and a DPE interconnect. The processor system is configured to transmit configuration data to the array of DPEs, and each of the DPEs is independently configurable based on the configuration data received at the respective DPE via the DPE interconnect of the respective DPE. The array of DPEs enable, without modifying operation of a first kernel of a first subset of the DPEs of the array of DPEs, reconfiguration of a second subset of the DPEs of the array of DPEs.
-
公开(公告)号:US20220283963A1
公开(公告)日:2022-09-08
申请号:US17826068
申请日:2022-05-26
Applicant: XILINX, INC.
Inventor: Juan J. NOGUERA SERRA , Goran Hk BILSKI , Baris OZGUL , Jan LANGER
IPC: G06F13/16 , G06F12/084 , G06F9/54
Abstract: Examples herein describe techniques for transferring data between data processing engines in an array using shared memory. In one embodiment, certain engines in the array have connections to the memory in neighboring engines. For example, each engine may have its own assigned memory module which can be accessed directly (e.g., without using a streaming or memory mapped interconnect). In addition, the surrounding engines (referred to herein as the neighboring engines) may also include direct connections to the memory module. Using these direct connections, the cores can load and/or store data in the neighboring memory modules.
-
公开(公告)号:US20230205726A1
公开(公告)日:2023-06-29
申请号:US18114850
申请日:2023-02-27
Applicant: XILINX, INC.
Inventor: Juan J. NOGUERA SERRA , Sneha Bhalchandra DATE , Jan LANGER , Baris OZGUL , Goran H.k. BILSKI
IPC: G06F15/177 , G06F15/173 , G06F15/80
CPC classification number: G06F15/177 , G06F15/17306 , G06F15/80 , G06F9/4401
Abstract: A device may include a processor system and an array of data processing engines (DPEs) communicatively coupled to the processor system. Each of the DPEs includes a core and a DPE interconnect. The processor system is configured to transmit configuration data to the array of DPEs, and each of the DPEs is independently configurable based on the configuration data received at the respective DPE via the DPE interconnect of the respective DPE. The array of DPEs enable, without modifying operation of a first kernel of a first subset of the DPEs of the array of DPEs, reconfiguration of a second subset of the DPEs of the array of DPEs.
-
公开(公告)号:US20220283985A1
公开(公告)日:2022-09-08
申请号:US17826070
申请日:2022-05-26
Applicant: XILINX, INC.
Inventor: Goran Hk BILSKI , Juan J. NOGUERA SERRA , Baris OZGUL , Jan LANGER , David CLARKE , Sneha Bhalchandra DATE
IPC: G06F15/80 , G06F13/40 , G06F15/173 , G06F13/16
Abstract: An example data processing engine (DPE) for a DPE array in an integrated circuit (IC) includes: a core; a memory including a data memory and a program memory, the program memory coupled to the core, the data memory coupled to the core and including at least one connection to a respective at least one additional core external to the DPE; support circuitry including hardware synchronization circuitry and direct memory access (DMA) circuitry each coupled to the data memory; streaming interconnect coupled to the DMA circuitry and the core; and memory-mapped interconnect coupled to the core, the memory, and the support circuitry.
-
公开(公告)号:US20220015588A1
公开(公告)日:2022-01-20
申请号:US17468346
申请日:2021-09-07
Applicant: XILINX, INC.
Inventor: Peter MCCOLGAN , Goran Hk BILSKI , Juan J. NOGUERA SERRA , Jan LANGER , Baris OZGUL , David CLARKE
Abstract: Examples herein describe techniques for communicating between data processing engines in an array of data processing engines. In one embodiment, the array is a 2D array where each of the DPEs includes one or more cores. In addition to the cores, the data processing engines can include streaming interconnects which transmit streaming data using two different modes: circuit switching and packet switching. Circuit switching establishes reserved point-to-point communication paths between endpoints in the interconnect which routes data in a deterministic manner. Packet switching, in contrast, transmits streaming data that includes headers for routing data within the interconnect in a non-deterministic manner. In one embodiment, the streaming interconnects can have one or more ports configured to perform circuit switching and one or more ports configured to perform packet switching.
-
-
-
-
-
-