Abstract:
A system, method, and computer program product are provided for compiling a computer program comprising arithmetic operations having different requirements with respect to numeric dynamic range, numeric resolution, or any combination thereof. The method comprises generating a transformed graph representation of the computer program by applying propagation rules that provide for relaxed numeric requirements, where applicable, and generating output code based on the transformed graph representation. Relaxing numeric requirements, such as dynamic range and resolution requirements, may advantageously lower power consumption during execution of the computer program.
Abstract:
A system includes a control circuit and first, second, and third ground-referenced single-ended signaling (GRS) driver circuits that are each coupled to an output signal. The control circuit is configured to generate a first, second, and third set of control signals that are each based on a respective phase of a clock signal. Each GRS driver circuit is configured to pre-charge a capacitor to store a charge based on the respective set of control signals during at least one phase of the clock signal and drive the output signal relative to a ground network by discharging the charge during a respective phase of the clock signal.
Abstract:
A system of interconnected chips comprising a multi-chip module (MCM) includes a first processor chip, a graphics processing cluster (GPC) chip, and an MCM package configured to include the first processor chip, the GPC chip, and an interconnect circuit. The first processor chip is configured to include a first ground-referenced single-ended signaling interface circuit. A first set of electrical traces fabricated within the MCM package and configured to couple the first single-ended signaling interface circuit to the interconnect circuit. The GPC chip is configured to include a second single-ended signaling interface circuit and to execute shader programs. A second set of electrical traces fabricated within the MCM package and configured to couple the second single-ended signaling interface circuit to the interconnect circuit. In one embodiment, each single-ended signaling interface advantageously implements ground-referenced single-ended signaling.
Abstract:
A method and a system are provided for variation-tolerant synchronization. A phase value representing a phase of a second clock signal relative to a first clock signal and a period value representing a relative period between the second clock signal and the first clock signal are received. An extrapolated phase value of the second clock signal relative to the first clock signal corresponding to a next transition of the first clock signal is computed based on the phase value and the period value.
Abstract:
One embodiment of the present invention sets forth a technique for technique for capturing and storing a level of an input signal using a dual-trigger low-energy flip-flop circuit that is fully-static and insensitive to fabrication process variations. The dual-trigger low-energy flip-flop circuit presents only three transistor gate loads to the clock signal and none of the internal nodes toggle when the input signal remains constant. One of the clock signals may be a low-frequency “keeper clock” that toggles less frequently than the other two clock signal that is input to two transistor gates. The output signal Q is set or reset at the rising clock edge using separate trigger sub-circuits. Either the set or reset may be armed while the clock signal is low, and the set or reset is triggered at the rising edge of the clock.
Abstract:
A method, computer program product, and system perform computations using a sparse convolutional neural network accelerator. Compressed-sparse data is received for input to a processing element, wherein the compressed-sparse data encodes non-zero elements and corresponding multi-dimensional positions. The non-zero elements are processed in parallel by the processing element to produce a plurality of result values. The corresponding multi-dimensional positions are processed in parallel by the processing element to produce destination addresses for each result value in the plurality of result values. Each result value is transmitted to a destination accumulator associated with the destination address for the result value.
Abstract:
A method, computer program product, and system perform computations using a processor. A first instruction including a first index vector operand and a second index vector operand is received and the first index vector operand is decoded to produce first coordinate sets for a first array, each first coordinate set including at least a first coordinate and a second coordinate of a position of a non-zero element in the first array. The second index vector operand is decoded to produce second coordinate sets for a second array, each second coordinate set including at least a third coordinate and a fourth coordinate of a position of a non-zero element in the second array. The first coordinate sets are summed with the second coordinate sets to produce output coordinate sets and the output coordinate sets are converted into a set of linear indices.
Abstract:
A method, computer program product, and system perform computations using a sparse convolutional neural network accelerator. A first vector comprising only non-zero weight values and first associated positions of the non-zero weight values within a 3D space is received. A second vector comprising only non-zero input activation values and second associated positions of the non-zero input activation values within a 2D space is received. The non-zero weight values are multiplied with the non-zero input activation values, within a multiplier array, to produce a third vector of products. The first associated positions are combined with the second associated positions to produce a fourth vector of positions, where each position in the fourth vector is associated with a respective product in the third vector. The products in the third vector are transmitted to adders in an accumulator array, based on the position associated with each one of the products.
Abstract:
A system and method are provided for generating non-overlapping enable signals. A peak voltage level is measured at an output of a current source that is configured to provide current to a voltage control mechanism. The non-overlapping enable signals are generated for the voltage control mechanism based on the peak voltage level. A system includes the current source, a downstream controller, and the voltage control mechanism that is coupled to the load. The current source is configured to provide current to the voltage control mechanism. The controller is configured to measure the peak voltage level at the output of the current source and generate the non-overlapping enable signals based on the peak voltage level. The non-overlapping enable signals provide a portion of the current to the load.
Abstract:
A system and method are provided for regulating a voltage level at a load. A current source generates a current and a voltage control mechanism provides a portion of the current to regulate the voltage level at the load. When the voltage level at the load is greater than a maximum voltage level, the current source is decoupled from the load and the current source is coupled to a current sink to reduce the voltage level at the load. An electric power conversion comprises the current source and the voltage control mechanism. A downstream controller is configured to control the voltage control mechanism to decouple the current source from the load and couple the current source to a current sink to reduce the voltage level at the load when the voltage level at the load is greater than a maximum voltage level.