-
1.
公开(公告)号:US12111878B2
公开(公告)日:2024-10-08
申请号:US16363463
申请日:2019-03-25
发明人: Geoffrey Burr , Benjamin Killeen
CPC分类号: G06F17/16 , G06F7/5443 , G06F17/15 , G06G7/16 , G06N3/065
摘要: According to one or more embodiments, a computer implemented method for implementing a convolutional neural network (CNN) using a crosspoint array includes configuring the crosspoint array corresponding to a convolution layer in the CNN by storing one or more convolution kernels of the convolution layer in one or more crosspoint devices of the crosspoint array. The method further includes performing computations for the CNN via the crosspoint array by transmitting voltage pulses corresponding to a vector of input data of the convolution layer to the crosspoint array. Performing the CNN computations further includes outputting an electric current representative of performing a multiplication operation at a crosspoint device in the crosspoint array based on a weight value stored by the crosspoint device and the voltage pulses from the input data. Performing the CNN computations further includes passing the output electric current from the crosspoint device to a selected integrator.
-
公开(公告)号:US12067481B2
公开(公告)日:2024-08-20
申请号:US16159684
申请日:2018-10-14
发明人: Geoffrey Burr
CPC分类号: G06N3/065 , G06F9/3885 , G06N3/045 , G06N3/08
摘要: Array-integrated upstream/downstream routers for circuit-switched parallel connectivity are provided. A system comprises an array of neural cores having at least one dimension, a plurality of signal wires, and a plurality of routers. Each neural core comprises a plurality of ordered input wires, a plurality of ordered output wires, and a plurality of synapses, each synapse operatively coupled to one of the plurality of input wires and one of the plurality of output wires. The plurality of signal wires are disposed along each dimension of the array of neural cores. Each router is operatively coupled to one of the plurality of neural cores and to at least one signal wire along each dimension of the array of neural cores. Each of the plurality of routers is adapted to selectively route a signal from the at least one signal wire to its coupled neural core. Each of the plurality of routers is adapted to selectively route a signal from its coupled neural core to the at least one signal wire.
-
3.
公开(公告)号:US12050997B2
公开(公告)日:2024-07-30
申请号:US16884130
申请日:2020-05-27
CPC分类号: G06N3/084 , G11C7/1006 , G11C11/54 , G11C13/0069 , G06N3/063 , G11C2213/77 , G11C2213/79
摘要: A computer implemented method for implementing a convolutional neural network (CNN) using a crosspoint array includes configuring the crosspoint array to implement a convolution layer by storing one or more weights in crosspoint devices of the array. The method further includes making multiple copies of the weights and training the CNN. Training the CNN includes mapping input data of the convolution layer to the crosspoint array in a row-by-row manner. Further the excitation is input in a row-by-row manner into the crosspoint array, thereby creating row-by-row forward output from the crosspoint array. Further, outputs from the crosspoint devices are stored to corresponding integrators. Errors in the outputs as compared to a desired output, from multiple rows are computed and back propagated in a row-by-row manner into the crosspoint array, the computed errors transmitted to a previous convolution layer.
-
公开(公告)号:US20230305841A1
公开(公告)日:2023-09-28
申请号:US17701308
申请日:2022-03-22
发明人: Shubham Jain , Geoffrey Burr , Yasuteru Kohda
CPC分类号: G06F9/30036 , G06F9/30032 , G06F9/3877
摘要: Efficient data layout and alignment techniques for effectively executing AI workloads in wide-vector accelerator systems are provided. In one aspect, a method for processing AI workloads includes: logically dividing a data vector into a hierarchy of segments and sub-segments with each of the segments including more than one of the sub-segments, wherein each of the sub-segments includes words, and each of the words includes data-bits; and physically mapping the data-bits such that the words belonging to a same given one of the sub-segments are mapped contiguously across all of the segments. An AI accelerator system is also provided.
-
公开(公告)号:US20230096894A1
公开(公告)日:2023-03-30
申请号:US17487372
申请日:2021-09-28
发明人: Geoffrey Burr , Kohji Hosokawa
摘要: An array of neural cores has at least two dimensions. Each of the neural cores comprises ordered input wires, ordered output wires, and synapses, each of the synapses operatively coupled to one of the input wires and one of the output wires. Signal wires are provided. At least one of the signal wires is disposed along each dimension of the array of neural cores. Each of the signal wires is disposed along at least one dimension of the array. Routers are provided, each of which is operatively coupled to (i) one of the neural cores and (ii) at least two of the signal wires, one along each of the dimensions of the array of neural cores. Each of the routers is configured to selectively route a signal from one of its at least two coupled signal wires to its coupled neural core. Each of the routers is configured to selectively route a signal from its coupled neural core to one of its at least two coupled signal wires.
-
公开(公告)号:US20220405554A1
公开(公告)日:2022-12-22
申请号:US17350162
申请日:2021-06-17
IPC分类号: G06N3/063
摘要: Embodiments herein disclose computer-implemented methods, computer program products and computer systems for balancing neural network weight asymmetries. The computer-implemented method may include providing a neural network with weights comprising one or more major conductance pairs and one or more minor conductance pairs. The method may further include programming the one or more major conductance pairs to force an inference output to an expected duration value, determining a positive weight coefficient based on the one or more major conductance pairs and a negative weight coefficient based on the one or more minor conductance pairs, determining one or more target weights based on one or more of the positive weight coefficient and the negative weight coefficient, programming the one or more minor conductance pairs to force the inference output to the expected duration value, and programming the one or more major conductance pairs with the one or more target weights.
-
7.
公开(公告)号:US12045612B2
公开(公告)日:2024-07-23
申请号:US17931537
申请日:2022-09-12
CPC分类号: G06F9/30036 , G06F9/3555 , G06N20/00
摘要: An efficient pipelined implementation of digital scaling, offset and aggregation operation supports element-by-element programmable scale and offset factors. The method includes time-multiplexed parallel pipelining of a plurality of digital data words, each of the plurality of digital data words encoding an N-bit signed integer, from one of a plurality of receive-registers through a datapath that can either (1) store the plurality of digital data words directly in a dedicated first memory, (2) store the plurality of digital data words directly in a dedicated second memory, or (3) direct the plurality of digital data words into a parallel set of fused-multiply-add units. The method further includes multiplying each digital data word by a corresponding data-word retrieved from the dedicated first memory to form product data words and adding the product data words to a corresponding data-word retrieved from the dedicated second memory to form an output sum-and-product data words.
-
8.
公开(公告)号:US20240220572A1
公开(公告)日:2024-07-04
申请号:US18092183
申请日:2022-12-30
摘要: A compute engine is configured to perform self-attention computations by delaying performance of a division operation of a softmax computation, the performance including iteratively computing a first matrix multiplication of a given row vector of a first matrix and each column vector of a second matrix while determining a first scalar element representing a maximum value of the iterative first matrix multiplications; iteratively subtracting a corresponding determined first scaler element from a result of each computed first matrix multiplication and computing an elementwise exponential function based on a result of the subtraction operation to generate a plurality of elements of a given row vector of a fourth matrix; iteratively computing a second matrix multiplication of a given row vector of the fourth matrix and each column vector of a third matrix while summing the given row vectors of the fourth matrix; and computing a row vector of an output matrix.
-
公开(公告)号:US20240211532A1
公开(公告)日:2024-06-27
申请号:US18083011
申请日:2022-12-16
摘要: Systems and methods for performing layer normalization are described. A circuit can receive a sequence of input data across a plurality of clock cycles, where the sequence of input data represents a portion of an input vector. The circuit can determine a plurality of sums and a plurality of sums of squares corresponding to the sequence of input data. The circuit can determine, based on the plurality of sums of squares, a first scalar representing an inverse square-root of a variance of vector elements in the input vector. The circuit can determine a second scalar representing a negation of a product of the first scalar and a mean of the vector elements in the input vector. The circuit can determine, based on the first scalar, the second scalar and the received sequence of input data, an output vector that is a normalization of the input vector.
-
公开(公告)号:US20240079326A1
公开(公告)日:2024-03-07
申请号:US17903342
申请日:2022-09-06
发明人: Biswanath Senapati , SEIJI MUNETOH , Nicholas Anthony Lanzillo , Lawrence A. Clevenger , Geoffrey Burr , Kohji Hosokawa
IPC分类号: H01L23/528 , H01L27/24 , H01L45/00
CPC分类号: H01L23/5286 , H01L27/2436 , H01L27/2463 , H01L45/06 , H01L45/16
摘要: An IC memory device includes a substrate and an array of memory cells on the substrate. Each memory cell includes at least one memory cell transistor in a layer of the device adjacent to the substrate. In the same layer, the device also includes a plurality of shunt transistors. The device also includes a buried metal signal rail, which is disposed between the array of memory cells and the plurality of shunt transistors in a buried layer that is embedded into the substrate below the transistors. The device also includes single-layer vias, which are in same layer as the transistors and electrically connect the memory cell transistors to the shunt transistors through the buried metal signal rail.
-
-
-
-
-
-
-
-
-