Abstract:
A processor includes a processor core and a calculation circuit. The processor core includes logic determine a set of weights for use in a convolutional neural network (CNN) calculation and scale up the weights using a scale value. The calculation circuit includes logic to receive the scale value, the set of weights, and a set of input values, wherein each input value and associated weight of a same fixed size. The calculation circuit also includes logic to determine results from convolutional neural network (CNN) calculations based upon the set of weights applied to the set of input values, scale down the results using the scale value, truncate the scaled down results to the fixed size, and communicatively couple the truncated results to an output for a layer of the CNN.
Abstract:
A processing device includes a processor core and a number of calculation modules that each is configurable to perform any one of operations for a convolutional neuron network system. A first set of the calculation modules are configured to perform convolution operations, a second set of the calculation modules are reconfigured to perform averaging operations, and a third set of the calculation modules are reconfigured to perform dot product operations.
Abstract:
Systems and methods for performing convolution operations. An example processing system comprises: a processing core; and a convolver unit to apply a convolution filter to a plurality of input data elements represented by a two-dimensional array, the convolver unit comprising a plurality of multipliers coupled to two or more sets of latches, wherein each set of latches is to store a plurality of data elements of a respective one-dimensional section of the two-dimensional array.
Abstract:
A processing device includes a processor core and a number of calculation modules that each is configurable to perform any one of operations for a convolutional neuron network system. A first set of the calculation modules are configured to perform convolution operations, a second set of the calculation modules are reconfigured to perform averaging operations, and a third set of the calculation modules are reconfigured to perform dot product operations.
Abstract:
Methods, apparatus, instructions and logic are disclosed providing double rounded combined floating-point multiply and add functionality as scalar or vector SIMD instructions or as fused micro-operations. Embodiments include detecting floating-point (FP) multiplication operations and subsequent FP operations specifying as source operands results of the FP multiplications. The FP multiplications and the subsequent FP operations are encoded as combined FP operations including rounding of the results of FP multiplication followed by the subsequent FP operations. The encoding of said combined FP operations may be stored and executed as part of an executable thread portion using fused-multiply-add hardware that includes overflow detection for the product of FP multipliers, first and second FP adders to add third operand addend mantissas and the products of the FP multipliers with different rounding inputs based on overflow, or no overflow, in the products of the FP multiplier. Final results are selected respectively using overflow detection.
Abstract:
A processor includes a front end, a decoder, an allocator, and a retirement unit. The decoder includes logic to identify an end-of-live-range (EOLR) indicator. The EOLR indicator specifies an architectural register and a location in code for which the architectural register is unused. The allocator includes logic to scan for a mapping of the architectural register to a physical register, based upon the EOLR indicator. The allocator also includes logic to generate a request to disassociate the architectural register from the physical register. The retirement unit includes logic to disassociate the architectural register from the physical register.
Abstract:
Methods, apparatus, instructions and logic are disclosed providing double rounded combined floating-point multiply and add functionality as scalar or vector SIMD instructions or as fused micro-operations. Embodiments include detecting floating-point (FP) multiplication operations and subsequent FP operations specifying as source operands results of the FP multiplications. The FP multiplications and the subsequent FP operations are encoded as combined FP operations including rounding of the results of FP multiplication followed by the subsequent FP operations. The encoding of said combined FP operations may be stored and executed as part of an executable thread portion using fused-multiply-add hardware that includes overflow detection for the product of FP multipliers, first and second FP adders to add third operand addend mantissas and the products of the FP multipliers with different rounding inputs based on overflow, or no overflow, in the products of the FP multiplier. Final results are selected respectively using overflow detection.
Abstract:
Systems and methods for performing convolution operations. An example processing system comprises: a processing core; and a convolver unit to apply a convolution filter to a plurality of input data elements represented by a two-dimensional array, the convolver unit comprising a plurality of multipliers coupled to two or more sets of latches, wherein each set of latches is to store a plurality of data elements of a respective one-dimensional section of the two-dimensional array.
Abstract:
A computer-readable storage medium, method and system for optimization-level aware branch prediction is described. A gear level is assigned to a set of application instructions that have been optimized. The gear level is also stored in a register of a branch prediction unit of a processor. Branch prediction is then performed by the processor based upon the gear level.
Abstract:
A processing device includes a processor core and a number of calculation modules that each is configurable to perform any one of operations for a convolutional neuron network system. A first set of the calculation modules are configured to perform convolution operations, a second set of the calculation modules are reconfigured to perform averaging operations, and a third set of the calculation modules are reconfigured to perform dot product operations.