Patent search ap:("Xilinx Page Inc.") AND inv:"David Berman"

1.

发明授权
Performing consecutive mac operations on a set of data using different kernels in a MAC circuit 有权

公开(公告)号：US11429850B2

公开(公告)日：2022-08-30

申请号：US16040357

申请日：2018-07-19

Applicant: Xilinx, Inc.

Inventor： Xiaoqian Zhang , Ephrem C. Wu , David Berman

IPC: G06N3/063 , G06F7/544 , G06F9/38 , G06F9/54 , G06F12/0875 , G06N3/04

Abstract: A circuit arrangement includes an array of MAC circuits, wherein each MAC circuit includes a cache configured for storage of a plurality of kernels. The MAC circuits are configured to receive a first set of data elements of an IFM at a first rate. The MAC circuits are configured to perform first MAC operations on the first set of the data elements and a first one of the kernels associated with a first OFM depth index during a first MAC cycle, wherein a rate of MAC cycles is faster than the first rate. The MAC circuits are configured to perform second MAC operations on the first set of the data elements and a second one of the kernels associated with a second OFM depth index during a second MAC cycle that consecutively follows the first MAC cycle.

2.

发明授权
Memory arrangement for tensor data 有权

公开(公告)号：US10346093B1

公开(公告)日：2019-07-09

申请号：US15923950

申请日：2018-03-16

Applicant: Xilinx, Inc.

Inventor： Ephrem C. Wu , Xiaoqian Zhang , David Berman

IPC: G06F12/00 , G06F3/06 , G11C7/10 , G06F12/06 , G06N3/08 , G06N3/04 , G06N3/02

Abstract: Disclosed circuitry includes RAM circuits, a memory controller, and an array of processing circuits. Each RAM circuit includes a read port and a write port. The memory controller accesses tensor data arranged in banks of tensor buffers in the RAM circuits. The memory controller is coupled to each read port by shared read control signal lines and to each write port by shared write control signal lines. The memory controller generates read control and write control signals for accessing different ones of the tensor buffers at different times. The array of processing circuits is coupled to one of the RAM circuits. The array includes multiple rows and multiple of columns of processing circuits for performing tensor operations on the tensor data. The processing circuits in each row in each array of processing circuits are coupled to input the same tensor data.

3.

发明申请
DATA TRANSFERS BETWEEN A MEMORY AND A DISTRIBUTED COMPUTE ARRAY 有权

公开(公告)号：US20210174848A1

公开(公告)日：2021-06-10

申请号：US16706437

申请日：2019-12-06

Applicant: Xilinx, Inc.

Inventor： Xiaoqian Zhang , Ephrem C. Wu , David Berman

IPC: G11C7/10 , G11C7/22 , G06N3/06

Abstract: An integrated circuit (IC) includes a plurality of dies. The IC includes a plurality of memory channel interfaces configured to communicate with a memory, wherein the plurality of memory channel interfaces are disposed within a first die of the plurality of dies. The IC may include a compute array distributed across the plurality of dies and a plurality of remote buffers distributed across the plurality of dies. The plurality of remote buffers are coupled to the plurality of memory channels and to the compute array. The IC further includes a controller configured to determine that each of the plurality of remote buffers has data stored therein and, in response, broadcast a read enable signal to each of the plurality of remote buffers initiating data transfers from the plurality of remote buffers to the compute array across the plurality of dies.

4.

发明申请
PERFORMING CONSECUTIVE MAC OPERATIONS ON A SET OF DATA USING DIFFERENT KERNELS IN A MAC CIRCUIT 审中-公开

公开(公告)号：US20200026989A1

公开(公告)日：2020-01-23

申请号：US16040357

申请日：2018-07-19

Applicant: Xilinx, Inc.

Inventor： Xiaoqian Zhang , Ephrem C. Wu , David Berman

IPC: G06N3/063 , G06N3/04 , G06F7/544 , G06F12/0875 , G06F9/54 , G06F9/38

Abstract: A circuit arrangement includes an array of MAC circuits, wherein each MAC circuit includes a cache configured for storage of a plurality of kernels. The MAC circuits are configured to receive a first set of data elements of an IFM at a first rate. The MAC circuits are configured to perform first MAC operations on the first set of the data elements and a first one of the kernels associated with a first OFM depth index during a first MAC cycle, wherein a rate of MAC cycles is faster than the first rate. The MAC circuits are configured to perform second MAC operations on the first set of the data elements and a second one of the kernels associated with a second OFM depth index during a second MAC cycle that consecutively follows the first MAC cycle.

5.

发明授权
Data transfers between a memory and a distributed compute array 有权

公开(公告)号：US11127442B2

公开(公告)日：2021-09-21

申请号：US16706437

申请日：2019-12-06

Applicant: Xilinx, Inc.

Inventor： Xiaoqian Zhang , Ephrem C. Wu , David Berman

IPC: G11C7/10 , G06N3/06 , G11C7/22

Abstract: An integrated circuit (IC) includes a plurality of dies. The IC includes a plurality of memory channel interfaces configured to communicate with a memory, wherein the plurality of memory channel interfaces are disposed within a first die of the plurality of dies. The IC may include a compute array distributed across the plurality of dies and a plurality of remote buffers distributed across the plurality of dies. The plurality of remote buffers are coupled to the plurality of memory channels and to the compute array. The IC further includes a controller configured to determine that each of the plurality of remote buffers has data stored therein and, in response, broadcast a read enable signal to each of the plurality of remote buffers initiating data transfers from the plurality of remote buffers to the compute array across the plurality of dies.

6.

发明授权
Neural-network pooling 有权

公开(公告)号：US11531869B1

公开(公告)日：2022-12-20

申请号：US16368397

申请日：2019-03-28

Applicant: Xilinx, Inc.

Inventor： Ephrem C. Wu , David Berman , Xiaoqian Zhang

IPC: G06N3/063 , G06N3/04

Abstract: Embodiments herein describe circuitry with improved efficiency when executing layers in a nested neural network. As mentioned above, a nested neural network has at least one split operation where a tensor generated by a first layer is transmitted to, and processed by several branches in the neural network. Each of these branches can have several layers that have data dependencies which result in a multiply-add array sitting idly. In one embodiment, the circuitry can include a dedicated pre-pooler for performing a pre-pooling operation. Thus, the pre-pooling operation can be performing in parallel with other operations (e.g., the convolution performed by another layer). Once the multiply-add array is idle, the pre-pooling operation has already completed (or at least, has already started) which means the time the multiply-add array must wait before it can perform the next operation is reduced or eliminated.

7.

发明授权
Neural network controller 有权

公开(公告)号：US11429851B1

公开(公告)日：2022-08-30

申请号：US16219303

申请日：2018-12-13

Applicant: Xilinx, Inc.

Inventor： Xiaoqian Zhang , Ephrem C. Wu , David Berman

IPC: G06N3/063 , G06N3/04 , G06F9/30 , G06F9/34

Abstract: Disclosed circuits and methods involve a first register configured to store of a first convolutional neural network (CNN) instruction during processing of the first CNN instruction and a second register configured to store a second CNN instruction during processing of the second CNN instruction. Each of a plurality of address generation circuits is configured to generate one or more addresses in response to an input CNN instruction. Control circuitry is configured to select one of the first CNN instruction or the second CNN instruction as input to the address generation circuits.

Search Results

Country/Region

Patent validity

Application date

Publication (announcement) day

applicant

The country/region where the applicant is located

Inventor

IPC

IPC Department

IPC class

IPC subclass

IPC group

IPC team

Appearance classification