-
公开(公告)号:US10990552B1
公开(公告)日:2021-04-27
申请号:US15944464
申请日:2018-04-03
Applicant: Xilinx, Inc.
Inventor: Goran Hk Bilski , Peter McColgan , Juan J. Noguera Serra , Baris Ozgul , Jan Langer , Richard L. Walke , Ralph D. Wittig , Kornelis A. Vissers , Philip B. James-Roxby , Christopher H. Dick
Abstract: Examples herein describe techniques for communicating between data processing engines in an array of data processing engines. In one embodiment, the array is a 2D array where each of the DPEs includes one or more cores. In addition to the cores, the data processing engines can include a memory module (with memory banks for storing data) and an interconnect which provides connectivity between the engines. To transmit processed data, a data processing engine identifies a destination processing engine in the array. Once identified, the data processing engine can transmit the processed data using a reserved point-to-point communication path in the interconnect that couples the source and destination data processing engines.
-
公开(公告)号:US11853235B2
公开(公告)日:2023-12-26
申请号:US17826068
申请日:2022-05-26
Applicant: XILINX, INC.
Inventor: Juan J. Noguera Serra , Goran Hk Bilski , Baris Ozgul , Jan Langer
IPC: G06F13/16 , G06F12/084 , G06F9/54 , G11C8/16 , G06F15/167
CPC classification number: G06F13/1663 , G06F9/544 , G06F12/084 , G06F15/167 , G11C8/16
Abstract: Examples herein describe techniques for transferring data between data processing engines in an array using shared memory. In one embodiment, certain engines in the array have connections to the memory in neighboring engines. For example, each engine may have its own assigned memory module which can be accessed directly (e.g., without using a streaming or memory mapped interconnect). In addition, the surrounding engines (referred to herein as the neighboring engines) may also include direct connections to the memory module. Using these direct connections, the cores can load and/or store data in the neighboring memory modules.
-
公开(公告)号:US11443091B1
公开(公告)日:2022-09-13
申请号:US16945006
申请日:2020-07-31
Applicant: Xilinx, Inc.
Inventor: Peter McColgan , Baris Ozgul , David Clarke , Tim Tuan , Juan J. Noguera Serra , Goran H. K. Bilski , Jan Langer , Sneha Bhalchandra Date , Stephan Munz , Jose Marques
IPC: G06F30/343 , G06F9/30 , G06F30/398 , G06F30/33
Abstract: An integrated circuit includes a plurality of data processing engines (DPEs) DPEs. Each DPE may include a core configured to perform computations. A first DPE of the plurality of DPEs includes a first core coupled to an input cascade connection of the first core. The input cascade connection is directly coupled to a plurality of source cores of the plurality of DPEs. The input cascade connection includes a plurality of inputs, wherein each of the plurality of inputs is connected to a cascade output of a different one of the plurality of source cores. The input cascade connection is programmable to enable a selected one of the plurality of inputs.
-
公开(公告)号:US11372803B2
公开(公告)日:2022-06-28
申请号:US15944408
申请日:2018-04-03
Applicant: Xilinx, Inc.
Inventor: Goran H. K. Bilski , Juan J. Noguera Serra , Baris Ozgul , Jan Langer , David Clarke , Sneha Bhalchandra Date
IPC: G06F15/80 , G06F13/40 , G06F15/173 , G06F13/16
Abstract: An example data processing engine (DPE) for a DPE array in an integrated circuit (IC) includes: a core; a memory including a data memory and a program memory, the program memory coupled to the core, the data memory coupled to the core and including at least one connection to a respective at least one additional core external to the DPE; support circuitry including hardware synchronization circuitry and direct memory access (DMA) circuitry each coupled to the data memory; streaming interconnect coupled to the DMA circuitry and the core; and memory-mapped interconnect coupled to the core, the memory, and the support circuitry.
-
公开(公告)号:US11113223B1
公开(公告)日:2021-09-07
申请号:US15944490
申请日:2018-04-03
Applicant: Xilinx, Inc.
Inventor: Peter McColgan , Goran H K Bilski , Juan J. Noguera Serra , Jan Langer , Baris Ozgul , David Clarke
Abstract: Examples herein describe techniques for communicating between data processing engines in an array of data processing engines. In one embodiment, the array is a 2D array where each of the DPEs includes one or more cores. In addition to the cores, the data processing engines can include streaming interconnects which transmit streaming data using two different modes: circuit switching and packet switching. Circuit switching establishes reserved point-to-point communication paths between endpoints in the interconnect which routes data in a deterministic manner. Packet switching, in contrast, transmits streaming data that includes headers for routing data within the interconnect in a non-deterministic manner. In one embodiment, the streaming interconnects can have one or more ports configured to perform circuit switching and one or more ports configured to perform packet switching.
-
公开(公告)号:US10747690B2
公开(公告)日:2020-08-18
申请号:US15944307
申请日:2018-04-03
Applicant: Xilinx, Inc.
Inventor: Goran H K Bilski , Juan J. Noguera Serra , Baris Ozgul , Jan Langer , Richard L. Walke , Ralph D. Wittig , Kornelis A. Vissers , Philip B. James-Roxby , Christopher H. Dick
IPC: G06F15/78 , G06F13/16 , G06F13/40 , G06F15/173 , H04L12/933
Abstract: A device may include a plurality of data processing engines. Each data processing engine may include a core and a memory module. Each core may be configured to access the memory module in the same data processing engine and a memory module within at least one other data processing engine of the plurality of data processing engines.
-
公开(公告)号:US10579559B1
公开(公告)日:2020-03-03
申请号:US15944303
申请日:2018-04-03
Applicant: Xilinx, Inc.
Inventor: Goran H K Bilski , Juan J. Noguera Serra , Jan Langer , Baris Ozgul
Abstract: An example data processing engine (DPE) for a DPE array in an integrated circuit (IC) includes a core, a memory including a data memory and a program memory, the program memory coupled to the core, the data memory coupled to the core and including at least one connection to a respective at least one additional core external to the DPE; support circuitry including hardware synchronization circuitry and direct memory access (DMA) circuitry each coupled to the data memory, and a stall circuit coupled to the core configured to stall or resume the core in response to one or more inputs.
-
公开(公告)号:US20190303311A1
公开(公告)日:2019-10-03
申请号:US15944307
申请日:2018-04-03
Applicant: Xilinx, Inc.
Inventor: Goran HK Bilski , Juan J. Noguera Serra , Baris Ozgul , Jan Langer , Richard L. Walke , Ralph D. Wittig , Kornelis A. Vissers , Philip B. James-Roxby , Christopher H. Dick
Abstract: A device may include a plurality of data processing engines. Each data processing engine may include a core and a memory module. Each core may be configured to access the memory module in the same data processing engine and a memory module within at least one other data processing engine of the plurality of data processing engines.
-
公开(公告)号:US09189458B1
公开(公告)日:2015-11-17
申请号:US13785135
申请日:2013-03-05
Applicant: Xilinx, Inc.
Inventor: Jan Langer , Baris Ozgul , Juan J. Noguera Serra
CPC classification number: H03F3/19 , G06F17/16 , H03F1/3247 , H03F1/3258 , H03F3/245
Abstract: An apparatus relating generally to generation of a compressed matrix is disclosed. In this apparatus, a row determination block is coupled to receive input samples and configuration information and is configured to provide a row output for each of the input samples. A matrix determination block is coupled to receive the row output and the configuration information. The matrix determination block is configured to: generate pivot row indices responsive to the configuration information; generate each outer product using the row output and any of the pivot row indices therefor; and accumulate, for each of the input samples, the outer product therefor for inclusion in the compressed matrix.
Abstract translation: 公开了一般涉及产生压缩矩阵的装置。 在该装置中,行确定块被耦合以接收输入样本和配置信息,并且被配置为为每个输入样本提供行输出。 矩阵确定块被耦合以接收行输出和配置信息。 矩阵确定块被配置为:响应于配置信息生成枢轴行索引; 使用行输出和任何一个枢轴行索引生成每个外部产品; 并且为每个输入样本累积用于包含在压缩矩阵中的外积。
-
-
-
-
-
-
-
-