Abstract:
An apparatus is provided which comprises: a multiplexer which is gated by a clock; and a flip-flop coupled to the multiplexer, wherein the flip-flop has a chain of at least four inverters one of which has an input to receive the clock.
Abstract:
In an embodiment, a router includes multiple input ports and output ports, where the router is of a source-synchronous hybrid network on chip (NoC) to enable communication between routers of the NoC based on transitions in control flow signals communicated between the routers. Other embodiments are described and claimed.
Abstract:
A FPMAC operation has two operands: an input operand and a weight operand. The operands may have a format of FP16, BF16, or INT8. Each operand is split into two portions. The two portions are stored in separate storage units. Then operands are transferred to register files of a PE, with each register file storing bits of an operand sequentially. The PE performs the FPMAC operation based on the operands. The PE may include an FPMAC unit configured to compute an individual partial sum of the PE. The PE may also include an FP adder to accumulate the individual partial sum with other data, such as an output from another PE or an output form another PE array. The FP adder may be fused with the FPMAC unit in a single circuit that can do speculative alignment and has separate critical paths for alignment and normalization.
Abstract:
In one embodiment, a processor comprises a first neuro-synaptic core comprising first circuitry to configure the first neuro-synaptic core as a neuron core responsive to a first value specified by a configuration parameter; and configure the first neuro-synaptic core as a synapse core responsive to a second value specified by the configuration parameter.
Abstract:
Some embodiments include apparatus and methods using an input stage and an output stage of a circuit. The input stage operates to receive an input signal and a clock signal and to provide an internal signal at an internal node based at least in part on the input signal. The input signal has levels in a first voltage range. The internal signal has levels in a second voltage range greater than the first voltage range. The output stage operates to receive the internal signal, the clock signal, and an additional signal generated based on the input signal. The output stage provides an output signal based at least in part on the input signal and the additional signal. The output signal has a third voltage range greater than the first voltage range.
Abstract:
An apparatus is provided which comprises: a clock node; a first inverter having an input coupled to the clock node; a data node; a master latch with a shared p-type keeper coupled to an output of the first inverter, the master latch coupled to the data node; and a slave latch coupled to an output of the master latch, the slave latch having a shared p-type keeper and a shared n-type footer, wherein the shared p-type keeper and the shared n-type footer of the slave latch are coupled to the clock node and the input of the first inverter.
Abstract:
Described is a latch which comprises: a first AND-OR-invert (AOI) logic gate; and a second AOI logic gate coupled to the first AOI logic gate, wherein the first and second AOI logic gates have respective first and second keeper devices coupled to a power supply node. Described is a flip-flop which comprises: a first latch including: a first AOI logic gate; and a second AOI logic gate coupled to the first AOI logic gate, wherein the first and second AOI logic gates have respective first and second keeper devices coupled to a power supply, the first latch having an output node; and a second latch having an input node coupled to the output node of the first latch, the second latch having an output node to provide an output of the flip-flop.
Abstract:
In an embodiment, a processor includes a compression domain threshold filter coupled to a plurality of cores. The compression domain threshold filter is to: receive a sample vector of compressed data to be filtered; calculate, based at least on a first subset of the elements of the sample vector, an estimated upper bound value of a dot product of the sample vector and a steering vector; determine whether the estimated upper bound value of the dot product satisfies a filter threshold value; and in response to a determination that the estimated upper bound value of the dot product does not satisfy the filter threshold value, discard the sample vector without completion of a calculation of the dot product of the sample vector and the steering vector. Other embodiments are described and claimed.
Abstract:
Described is a latch which comprises: a first AND-OR-invert (AOI) logic gate; and a second AOI logic gate coupled to the first AOI logic gate, wherein the first and second AOI logic gates have respective first and second keeper devices coupled to a power supply node. Described is a flip-flop which comprises: a first latch including: a first AOI logic gate; and a second AOI logic gate coupled to the first AOI logic gate, wherein the first and second AOI logic gates have respective first and second keeper devices coupled to a power supply, the first latch having an output node; and a second latch having an input node coupled to the output node of the first latch, the second latch having an output node to provide an output of the flip-flop.
Abstract:
Apparatus and method for a scalable, free running neuromorphic processor. For example, one embodiment of a neuromorphic processing apparatus comprises: a plurality of neurons; an interconnection network to communicatively couple at least a subset of the plurality of neurons; a spike controller to stochastically generate a trigger signal, the trigger signal to cause a selected neuron to perform a thresholding operation to determine whether to issue a spike signal.