摘要:
The disclosed method may include (1) receiving a precision level of each weight associated with each input of a node of a computational model, (2) identifying, for each weight, one of a plurality of multiplier groups, where each multiplier group may include a plurality of hardware multipliers of a corresponding bit width, and where the corresponding bit width of the plurality of hardware multipliers of the one of the plurality of multiplier groups may be sufficient to multiply the weight by the associated input, and (3) multiplying each weight by its associated input using an available hardware multiplier of the one of the plurality of multiplier groups identified for the weight. Various other processing elements, methods, and systems are also disclosed.
摘要:
The present embodiments relate to performing reduced-precision floating-point arithmetic operations using specialized processing blocks with higher-precision floating-point arithmetic circuitry. A specialized processing block may receive four floating-point numbers that represent two single-precision floating-point numbers, each separated into an LSB portion and an MSB portion, or four half-precision floating-point numbers. A first partial product generator may generate a first partial product of first and second input signals, while a second partial product generator may generate a second partial product of third and fourth input signals. A compressor circuit may generate carry and sum vector signals based on the first and second partial products; and circuitry may anticipate rounding and normalization operations by generating in parallel based on the carry and sum vector signals at least two results when performing the single-precision floating-point operation and at least four results when performing the two half-precision floating-point operations.
摘要:
Multiple data wordlengths may be supported by a processor through a single data path and/or a single set of registers. For example, the processor may support 16-bit wordlengths and 24-bit wordlengths through a single datapath. For supported data wordlengths that are less than the wordlength of the registers and datapath, the data may be left-aligned within the registers and datapath. The left alignment of data may allow saturation detection in the processor to be performed by examining the same saturation point regardless of the wordlength of the data being operated on. A special saturation mode of the processor may set the lower bits to zero when a configuration register or instruction-bit is set and saturation is detected.
摘要:
A tool for supporting vector operations in a scalar data path. The tool determines a location for a split in the scalar mode configuration to enable vector mode operations. The tool determines the number of coarse shift multiplexers in conflict at a bit position receiving data from both the left half and the right half of the vector mode configuration. The tool duplicates a coarse shift multiplexer in conflict at the bit position receiving data from both the left half and the right half of the vector mode configuration. The tool duplicates an intermediate data signal in conflict at a signal position receiving data from both the left half and the right half of the vector mode configuration. The tool receives a control signal to split the scalar mode configuration and shift the leading zeros across the left half and the right half of the vector mode configuration.
摘要:
A tool for supporting vector operations in a scalar data path. The tool determines a location for a split in the scalar mode configuration to enable vector mode operations. The tool determines the number of coarse shift multiplexers in conflict at a bit position receiving data from both the left half and the right half of the vector mode configuration. The tool duplicates a coarse shift multiplexer in conflict at the bit position receiving data from both the left half and the right half of the vector mode configuration. The tool duplicates an intermediate data signal in conflict at a signal position receiving data from both the left half and the right half of the vector mode configuration. The tool receives a control signal to split the scalar mode configuration and shift the leading zeros across the left half and the right half of the vector mode configuration.
摘要:
A floating point execution unit is capable of selectively repurposing a subset of the significand bits in a floating point value for use as additional exponent bits to dynamically provide an extended range for floating point calculations. A significand field of a floating point operand may be considered to include first and second portions, with the first portion capable of being concatenated with the second portion to represent the significand for a floating point value, or, to provide an extended range, being concatenated with the exponent field of the floating point operand to represent the exponent for a floating point value.
摘要:
A hardware circuit component configured to support vector operations in a scalar data path. The hardware circuit component configured to operate in a vector mode configuration and in a scalar mode configuration. The hardware circuit component configured to split the scalar mode configuration into a left half and a right half of the vector mode configuration. The hardware circuit component configured to perform one or more bit shifts over one or more stages of interconnected multiplexers in the vector mode configuration. The hardware circuit component configured to include duplicated coarse shift multiplexers at bit positions that receive data from both the left half and the right half of the vector mode configuration, resulting in one or more coarse shift multiplexers sharing the bit position.
摘要:
An adder for supporting multiple data types by controlling a carry propagation is provided. The adder includes a plurality of first addition areas configured to receive pieces of incoming operand data, wherein each of the plurality of first addition areas includes a predetermined unit number of bits, and a plurality of second addition areas configured to receive pieces of control data based on a type of the operand data and an operation type, wherein the plurality of second addition areas are alternately arranged between the plurality of first addition areas.
摘要:
An encryption chip is programmable to process a variety of secret key and public key encryption algorithms. The chip includes a pipeline of processing elements, each of which can process a round within a secret key algorithm. Data is transferred between the processing elements through dual port memories. A central processing unit allows for processing of very wide data words from global memory in single cycle operations. An adder circuit is simplified by using plural relatively small adder circuits with sums and carries looped back in plural cycles. Multiplier circuitry can be shared between the processing elements and the central processor by adapting the smaller processing element multipliers for concatenation as a very wide central processor multiplier.
摘要:
Digital signal processing (“DSP”) circuit blocks are provided that can more easily work together to perform larger (e.g., more complex and/or more arithmetically precise) DSP operations if desired. These DSP blocks may also include redundancy circuitry that facilitates stitching together multiple such blocks despite an inability to use some block (e.g., because of a circuit defect). Systolic registers may be included at various points in the DSP blocks to facilitate use of the blocks to implement systolic form, finite-impulse-response (“FIR”), digital filters.