摘要:
A data processing system supports vector operands with components representing different bit significance portions of an integer number. Processing circuitry performs a processing operation specified by a program instruction in dependence upon a number of components comprising the vector as specified by metadata for the vector.
摘要:
An apparatus 8 for performing a selectable one of multi-element comparison and multi-element addition is formed from a carry propagate adders stage 12 supplied with four non-final intermediate operands formed from the input vector, a non-final limit value selecting stage 14, which when performing a multi-element comparison serves to select, in dependence upon at least carry save values generated by the carry propagate adder, limit values that are of a larger or a smaller value of a pair of elements. A final intermediate operand forming stage 16 forms final intermediate operands from two non-final intermediate sum values from the carry propagate adders stage 12 and supplies these to a final output adder stage 18 which forms a sum of these two final intermediate operands to generate an output operand which can be either one or more candidates for limit values that will be a maximum or minimum value, or a sum value, or partial sum values in the case of a multi-element addition.
摘要:
A data processing apparatus has floating-point add circuitry for performing a floating-point add operation for adding or subtracting two floating-point operands. The apparatus also has reciprocal estimation circuitry for performing a reciprocal estimation operation on a first operand to generate a reciprocal estimate value which represents an estimate of a reciprocal of a first operand or an estimate or a reciprocal of the square root of the first operand. The reciprocal estimation circuitry is physically distinct from the floating-point adder circuitry, which allows both the reciprocal estimate and the add operations to be faster.
摘要:
Apparatus for data processing and a method of data processing are provided. Shift circuitry performs a shift operation in response to a shift instruction, shifting bits of an input data value in a direction specified by the shift instruction. Bit location indicator generation circuitry and comparison circuitry operate in parallel with the shift circuitry. The bit location indicator indicates at least one bit location in the input data value which must not have a bit set if the shifted data value is not to saturate. Comparison circuitry compares the bit location indicator with the input data value and indicates a saturation condition if any bits are indicated by the bit position indicator for bit locations which hold set bits in the input data value. A faster indication of the saturation condition thus results.
摘要:
A data processing apparatus and method are provided for performing a shift function on a binary number. The apparatus comprises count determination circuitry for determining a number of contiguous bit positions in the binary number that have a predetermined bit value, the count determination circuitry outputting a count value indicative of the number of contiguous bit positions determined. In parallel with the operation of the count determination circuitry, coarse shifting circuitry is used to determine, for at least one predetermined number of contiguous bit positions, whether that predetermined number of contiguous bit positions within the binary number has said predetermined bit value. An initial shift operation is then performed on the binary number based on that determination in order to produce an intermediate binary number. Once the count value is available from the count determination circuitry, fine shifting circuitry then performs a further shift operation on the intermediate binary number, based on the count value output by the count determination circuitry, in order to produce the result binary number. This provides an efficient mechanism for performing a shift function on a binary number, whilst still capturing the count value from the count determination circuitry.
摘要:
Apparatus, method and non-transitory computer-readable medium to store computer-readable code for fabrication of an apparatus. The apparatus comprises instruction decode circuitry to decode instructions and processing circuitry to execute the instructions decoded by the instruction decode circuitry. The processing circuitry comprises chained-floating-point-multiply-accumulate circuitry responsive to a chained-floating-point-multiply-accumulate instruction decoded by the instruction decoder, the chained-floating-point-multiply-accumulate instruction specifying a first floating-point operand, a second floating-point operand and a third floating-point operand, to: generate an unrounded product based on multiplying the first floating-point operand and the second floating-point operand; generate a first rounding increment based on the unrounded product; generate a sum based on adding the unrounded product, a value based on the first rounding increment, and the third floating-point operand; determine a second rounding increment based on the sum; and perform rounding based on the second rounding increment.
摘要:
A data processing apparatus is provided. Intermediate value generation circuitry generates an intermediate value from a first floating point number and a second floating point number. The intermediate value includes a number of leading 0s indicative of a prediction of a number of leading 0s in a difference between absolute values of the first floating point number and the second floating point number. The prediction differs by at most one from the number of leading 0s in the difference between absolute values of the first floating point number and the second floating point number. Count circuitry counts the number of leading 0s in said intermediate value and mask generation circuitry produces one or more masks using the intermediate value. The mask generation circuitry produces the one or more masks at the same time or before the count circuitry counts the number of leading 0s in the intermediate value.
摘要:
A floating point adder includes leading zero anticipation circuitry 18 to determine a number of leading zeros within a result significand value of a sum of a first floating point operand and a second floating point operand. This number of leading zeros is used to generate a mask which in turn selects input bits from a non-normalized significand produced by adding the first significand value and the second significand value. The non-normalized significand is then normalized at the same time as the output rounding bits used to round the normalized significand value are generated by rounding bit generation circuitry 40.
摘要:
Processing circuitry performs a plurality of lanes of processing on respective data elements of at least one operand vector to generate corresponding result data elements of a result vector. The processing circuitry identifies lane position information for each lane of processing, the lane position information for a given lane identifying a relative position of the corresponding result data element to be generated by the given lane within a corresponding result data value spanning one or more result data elements of the result vector. The processing circuitry is configured to perform each lane of processing in dependence on the lane position information identified for that lane. This enables generation of results which are wider or narrower than the vector size supported in hardware.
摘要:
An apparatus and method for floating-point multiplication are provided. Two partial products are generated from two operand significands, which are then added to generate a product significand. The value of an unbiased result exponent is determined from the operand exponent values and leading zero counts, and a shift amount and direction for the product significand are determined in dependence on a predetermined minimum exponent value of a predetermined canonical format. The product significand is shifted by the shift amount in the shift direction. An overflow mask identifying an overflow bit position of the product significand is generated by right shifting a predetermined mask pattern by the shift amount, and the overflow mask is applied to the product significand to extract an overflow value at the overflow bit position. This extraction of the overflow value happens before the shift circuitry shifts the product significand, allowing an overall faster floating-point multiplication to be performed.