摘要:
A semiconductor processor is described. The semiconductor processor includes logic circuitry to perform a logical reduction instruction. The logic circuitry has swizzle circuitry to swizzle a vector's elements so as to form a swizzle vector. The logic circuitry also has vector logic circuitry to perform a vector logic operation on said vector and said swizzle vector.
摘要:
Methods, apparatus, instructions and logic are disclosed providing double rounded combined floating-point multiply and add functionality as scalar or vector SIMD instructions or as fused micro-operations. Embodiments include detecting floating-point (FP) multiplication operations and subsequent FP operations specifying as source operands results of the FP multiplications. The FP multiplications and the subsequent FP operations are encoded as combined FP operations including rounding of the results of FP multiplication followed by the subsequent FP operations. The encoding of said combined FP operations may be stored and executed as part of an executable thread portion using fused-multiply-add hardware that includes overflow detection for the product of FP multipliers, first and second FP adders to add third operand addend mantissas and the products of the FP multipliers with different rounding inputs based on overflow, or no overflow, in the products of the FP multiplier. Final results are selected respectively using overflow detection.
摘要:
Computer method and apparatus for performing a square root or division operation generating a root or quotient is presented. A partial remainder is stored in radix-2 or radix-4 signed digit format. A decoder is provided for computing a root or quotient digit, and a correction term dependent on a number of the most significant digits of the partial remainder. An adder is provided for computing the sum of the signed digit partial remainder and the correction term in binary format, and providing the result in signed digit format. The adder computes a carry out independent of a carry in bit and a sum dependent on a Carry_in bit providing a fast adder independent of carry propagate delays. The scaler performs a multiplication by two of the result output from the adder in signed digit format to provide a signed digit next partial remainder.
摘要:
Methods and apparatus for extending OpenvSwitch (OVS) megaflow offloads to hardware to address hardware pipeline limitations. Under a method implemented on a compute platform including a Network Interface Controller (NIC) having one or more ports and running software including OVS software and a Linux operating system having a kernel including a TC-flower module and a NIC driver a new megaflow is created with a mask in the OVS software employing a subset of microflow fields for a packet. The microflow fields and the megaflow mask is sent to the NIC driver. A new megaflow is implemented in the NIC driver employing a subset of the microflow fields and the NIC driver creates a new hardware flow on the NIC employing a packet match scheme using all the microflow fields. The NIC also programs a NIC hardware pipeline with the new hardware flow using a match scheme that may depend on the available hardware resources, such as the size of a TCAM.
摘要:
Methods, apparatus, instructions and logic are disclosed providing double rounded combined floating-point multiply and add functionality as scalar or vector SIMD instructions or as fused micro-operations. Embodiments include detecting floating-point (FP) multiplication operations and subsequent FP operations specifying as source operands results of the FP multiplications. The FP multiplications and the subsequent FP operations are encoded as combined FP operations including rounding of the results of FP multiplication followed by the subsequent FP operations. The encoding of said combined FP operations may be stored and executed as part of an executable thread portion using fused-multiply-add hardware that includes overflow detection for the product of FP multipliers, first and second FP adders to add third operand addend mantissas and the products of the FP multipliers with different rounding inputs based on overflow, or no overflow, in the products of the FP multiplier. Final results are selected respectively using overflow detection.
摘要:
Methods, systems and computer program products for dynamic selection and switching of TCP congestion control algorithms over a TCP connection. Exemplary embodiments include a TCP congestion control algorithm management method, including establishing a first TCP connection on a first network having an end point, wherein the TCP connection includes a first TCP congestion control algorithm, monitoring path characteristics of the TCP connection and dynamically selecting and switching to a second TCP congestion control algorithm in a response to a change in the path characteristics of the TCP connection.
摘要:
Computer method and apparatus for performing a square root or division operation generating a root or quotient. A partial remainder is stored in radix-2 or radix-4 signed digit format. A decoder is provided for computing a root or quotient digit, and a correction term dependent on a number of the most significant digits of the partial remainder. An adder is provided for computing the sum of the signed digit partial remainder and the correction term in binary format, and providing the result in signed digit format. The adder computes a carry out independent of a carry in bit and a sum dependent on a Carry_in bit providing a fast adder independent of carry propagate delays. The scaler performs a multiplication by two of the result output from the adder in signed digit format to provide a signed digit next partial remainder.
摘要:
The invention provides computer apparatus for performing a square root or division operation generating a root or quotient. A partial remainder is stored in radix-2 or radix-4 signed digit format. A decoder is provided for computing a root or quotient digit, and a correction term dependent on a number of the most significant digits of the partial remainder. An adder is provided for computing the sum of the signed digit partial remainder and the correction term in binary format, and providing the result in signed digit format. The adder computes a carry out independent of a carry in bit and a sum dependent on a Carry_in bit providing a fast adder independent of carry propagate delays. The scaler performs a multiplication by two of the result output from the adder in signed digit format to provide a signed digit next partial remainder.
摘要:
A method and apparatus are presented for efficient implementation of logic and arithmetic functions that generate sets of mutually exclusive output signals. Such a logic family includes a network of NMOS transistors that implements a desired logic function. Coupled to that network is a minimal number of PMOS devices for providing logic level restoration and for compensating for any voltage drops due to the NMOS transistors. With such a structure, the speed, area and power consumption characteristics of logic functions are improved.
摘要:
The arithmetic operations performed for floating point format numbers involve procedures having a multiplicity of major steps. The effective subtraction operation can be accelerated by using two methods of execution depending on whether the absolute value of the difference between the arguments of the exponents, ABS{DELTA(E)} is .ltoreq.1 or >1. The procedure for ABS{DELTA(E)}.ltoreq.1 requires more major process steps than the situation where ABS{DELTA(E)}.ltoreq.1. To accelerate only the procedure having more major process steps, the two least significant bits of both exponent arguments are examined and based on the examination, the lengthier procedure can be initiated in parallel with the process step determining the value of ABS{DELTA(E)}. When the lengthier procedure is determined to be inappropriate based on the determined value, the results of the lengthier process can be discarded. Otherwise, the lengthier process, already in progress, is continued.