Abstract:
Methods and apparatus for performing machine learning tasks, and in particular, a hybrid architecture that includes both neural processing unit (NPU) and compute-in-memory (CIM) elements. One example neural-network-processing circuit generally includes a plurality of CIM processing elements (PEs), a plurality of neural processing unit (NPU) PEs, and a bus coupled to the plurality of CIM PEs and to the plurality of NPU PEs. One example method for neural network processing generally includes processing data in a neural-network-processing circuit comprising a plurality of CIM PEs, a plurality of NPU PEs, and a bus coupled to the plurality of CIM PEs and to the plurality of NPU PEs; and transferring the processed data between at least one of the plurality of CIM PEs and at least one of the plurality of NPU PEs via the bus.
Abstract:
A compute-in-memory bitcell is provided that includes a pair of cross-coupled inverters for storing a stored bit. The compute-in-memory bitcell includes a logic gate for multiplying the stored bit with an input vector bit. An output node for the logic gate connects to a second plate of a capacitor. A first plate of the capacitor connects to a read bit line. A write driver controls a power supply voltage to the cross-coupled inverters, the first switch, and the second switch to capacitively write the stored bit to the pair of cross-coupled inverters.
Abstract:
Certain aspects provide methods and apparatus for multiplication of digital signals. In accordance with certain aspects, a multiplication circuit may be used to multiply a portion of a first digital input signal with a portion of a second digital input signal via a first multiplier circuit to generate a first multiplication signal, and multiply another portion of the first digital input signal with another portion of the second digital input signal via a second multiplier circuit to generate a second multiplication signal. A third multiplier circuit and multiple adder circuits may be used to generate an output of the multiplication circuit based on the first and second multiplication signals.
Abstract:
Certain aspects provide a circuit for in-memory computation. The circuit generally includes a first memory cell, and a first computation circuit. The first computation circuit may include a first switch having a control input coupled to an output of the first memory cell, a second switch coupled between a node of the first computation circuit and the first switch, a control input of the second switch being coupled to a discharge word-line (DCWL), a capacitive element coupled between the node and a reference potential node, a third switch coupled between the node and a read bit-line (RBL), and a fourth switch coupled between the node and an activation (ACT) line.
Abstract:
A device capacitor structure within middle of line (MOL) layers includes a first MOL interconnect layer. The first MOL interconnect layer may include active contacts between a set of dummy gate contacts on an active surface of a semiconductor substrate. The device capacitor structure also includes a second MOL interconnect layer. The second MOL interconnect layer may include a set of stacked contacts directly on exposed ones of the active contacts. The second MOL interconnect layer may also include a set of fly-over contacts on portions of an etch-stop layer on some of the active contacts. The fly-over contacts and the stacked contacts may provide terminals of a set of device capacitors.
Abstract:
Systems and methods are directed to a three-terminal semiconductor device including a self-aligned via for connecting to a gate terminal Hardmasks and spacers formed over top portions and sidewall portions of a drain connection to a drain terminal and a source connection to a source terminal protect and insulate the drain connection and the source connection, such that short circuits are avoided between the source and drain connections and the self-aligned via. The self-aligned via provides a direct metal-gate connection path between the gate terminal and a metal line such as a M1 metal line while avoiding a separate gate connection layer.
Abstract:
A semiconductor device includes a gate and a first active contact adjacent to the gate. Such a device further includes a first stacked contact electrically coupled to the first active contact, including a first isolation layer on sidewalls electrically isolating the first stacked contact from the gate. The device also includes a first via electrically coupled to the gate and landing on the first stacked contact. The first via electrically couples the first stacked contact and the first active contact to the gate to ground the gate.
Abstract:
Methods and apparatus for performing machine learning tasks, and in particular, to a neural-network-processing architecture and circuits for improved performance through depth parallelism. One example neural-network-processing circuit generally includes a plurality of groups of processing element (PE) circuits, wherein each group of PE circuits comprises a plurality of PE circuits configured to process in parallel an input at a plurality of depths.
Abstract:
Methods and apparatus for performing machine learning tasks, and in particular, to a neural-network-processing architecture and circuits for improved handling of partial accumulation results in weight-stationary operations, such as operations occurring in compute-in-memory (CIM) processing elements (PEs). One example PE circuit for machine learning generally includes an accumulator circuit, a flip-flop array having an input coupled to an output of the accumulator circuit, a write register, and a first multiplexer having a first input coupled to an output of the write register, having a second input coupled to an output of the flip-flop array, and having an output coupled to a first input of the first accumulator circuit.
Abstract:
In one embodiment, a method of simulating an operation of an artificial neural network on a binary neural network processor includes receiving a binary input vector for a layer including a probabilistic binary weight matrix and performing vector-matrix multiplication of the input vector with the probabilistic binary weight matrix, wherein the multiplication results are modified by simulated binary-neural-processing hardware noise, to generate a binary output vector, where the simulation is performed in the forward pass of a training algorithm for a neural network model for the binary-neural-processing hardware.