Abstract:
Systems and methods for selective refresh of a cache, such as a last-level cache implemented as an embedded DRAM (eDRAM). A refresh bit and a reuse bit are associated with each way of at least one set of the cache. A least recently used (LRU) stack tracks positions of the ways, with positions towards a most recently used position of a threshold comprising more recently used positions and positions towards a least recently used position of the threshold comprise less recently used positions. A line in a way is selectively refreshed if the position of the way is one of the more recently used positions and if the refresh bit associated with the way is set, or the position of the way is one of the less recently used positions and if the refresh bit and the reuse bit associated with the way are both set.
Abstract:
Systems and methods relate to a low-dropout voltage (LDO) voltage regulator which receives a maximum supply voltage and provides a regulated voltage to a load, where the load may be a processing core of a multi-core processing system. A leakage current supply source includes a leakage current sensor to determine a leakage current demand of the load of the LDO voltage regulator and a leakage current supply circuit to supply the leakage current demand. In this manner, the leakage current supply source provides current assistance to the LDO voltage regulator, such that the LDO voltage regulator can supply only dynamic current. Thus, headroom voltage of the LDO voltage regulator, which is a difference between the maximum supply voltage and the regulated voltage, can be reduced. Reducing the headroom voltage allows greater number of dynamic voltage and frequency scaling states of the load.
Abstract:
A compute-in-memory array is provided that implements a filter for a layer in a neural network. The filter multiplies a plurality of activation bits by a plurality of filter weight bits for each channel in a plurality of channels through a charge accumulation from a plurality of capacitors. The accumulated charge is digitized to provide the output of the filter.
Abstract:
A dual-mode memory is provided that includes a self-timed clock circuit for asserting a sense enable signal for a sense amplifier. In a low-bandwidth read mode, the self-timed clock circuit asserts the sense enable signal only once during a memory clock cycle. The sense amplifier then senses only a single bit from a group of multiplexed columns. In a high-bandwidth read mode, the self-timed clock circuit successively asserts the sense enable signal so that the sense amplifier successively senses bits from the multiplexed columns.
Abstract:
Certain aspects of the present disclosure provide apparatus and methods for performing memory read operations. One example method generally includes precharging a plurality of memory columns during a precharging phase of a read access cycle. The method also includes sensing first data stored in a first memory cell of a first memory column of the plurality of memory columns during a memory read phase of the read access cycle, and sensing second data stored in a second memory cell of a second memory column of the plurality of memory columns during the same memory read phase of the read access cycle.
Abstract:
A method for multiplication and accumulation includes performing multiplications on a first set of bits and a second set of bits to generate first products, and performing multiplications on a third set of bits and a fourth set of bits to generate second products. The method also includes summing the first products to generate a first sum, changing a bit value of one of the second products, and summing the second products to generate a second sum. The method further includes averaging the first sum and the second sum to obtain an average of the first sum and the second sum, converting the average of the first sum and the second sum into a digital signal, and shifting and adding a one to the digital signal.
Abstract:
A compute-in-memory array is provided that implements a filter for a layer in a neural network. The filter multiplies a plurality of activation bits by a plurality of filter weight bits for each channel in a plurality of channels through a charge accumulation from a plurality of capacitors. The accumulated charge is digitized to provide the output of the filter.
Abstract:
Methods and apparatus for performing machine learning tasks, and in particular, to a neural-network-processing architecture and circuits for improved performance through depth parallelism. One example neural-network-processing circuit generally includes a plurality of groups of processing element (PE) circuits, wherein each group of PE circuits comprises a plurality of PE circuits configured to process in parallel an input at a plurality of depths.
Abstract:
Systems and methods are directed to a configurable last level driver coupled to a inductor-capacitor (LC) tank or resonant clock, for improving energy efficiency of the resonant clock. In a warm up stage, the last level clock driver can be enabled to store energy in the LC tank, and in a gating stage, the last level clock driver can be fully or partially disabled such that energy stored in the LC tank can be recirculated into a clock distribution network. In a refreshing stage, the last level clock driver can be enabled to replenish the energy lost by the LC tank in the recirculation of energy into the clock distribution network during the gating stage. Programmable counters can be used to control durations of the warm up, gating, and refreshing stages.