摘要:
Embodiments of systems, apparatuses, and methods for performing a blend instruction in a computer processor are described. In some embodiments, the execution of a blend instruction causes a data element-by-element selection of data elements of first and second source operands using the corresponding bit positions of a writemask as a selector between the first and second operands and storage of the selected data elements into the destination at the corresponding position in the destination.
摘要:
Embodiments of systems, apparatuses, and methods for performing an align instruction in a computer processor are described. In some embodiments, the execution of an align instruction causes the selective storage of data elements of two concatenated sources to be stored in a destination.
摘要:
An apparatus and method are described for performing efficient gather operations in a pipelined processor. For example, a processor according to one embodiment of the invention comprises: gather setup logic to execute one or more gather setup operations in anticipation of one or more gather operations, the gather setup operations to determine one or more addresses of vector data elements to be gathered by the gather operations; and gather logic to execute the one or more gather operations to gather the vector data elements using the one or more addresses determined by the gather setup operations.
摘要:
A method performed by a processor is described. The method includes executing an instruction. The instruction has an address as an operand. The executing of the instruction includes sending a signal to cache coherence protocol logic of the processor. In response to the signal, the cache coherence protocol logic issues a request for ownership of a cache line at the address. The cache line is not in a cache of the processor. The request for ownership also indicates that the cache line is not to be sent to the processor.
摘要:
Instructions and logic provide vector linear interpolation functionality. In some embodiments, responsive to an instruction specifying: a first operand from a set of vector registers, a size of each of the vector elements, a portion of the vector elements upon which to compute linear interpolations, a second operand from a set of vector registers, and a third operand; an execution unit, reads a first, a second and a third value of the size of vector elements from corresponding data fields in the first, the second and the third operand respectively and computes an interpolated value as the first value multiplied by the second value minus the second value multiplied by the third value plus the third value.
摘要:
A processing core is described having execution unit logic circuitry having a first register to store a first vector input operand, a second register to a store a second vector input operand and a third register to store a packed data structure containing scalar input operands a, b, c. The execution unit logic circuitry further include a multiplier to perform the operation (a*(first vector input operand))+(b*(second vector operand))+c.
摘要:
A method performed by a processor is described. The method includes executing an instruction. The instruction has an address as an operand. The executing of the instruction includes sending a signal to cache coherence protocol logic of the processor. In response to the signal, the cache coherence protocol logic issues a request for ownership of a cache line at the address. The cache line is not in a cache of the processor. The request for ownership also indicates that the cache line is not to be sent to the processor.
摘要:
A processing core is described having execution unit logic circuitry having a first register to store a first vector input operand, a second register to a store a second vector input operand and a third register to store a packed data structure containing scalar input operands a, b, c. The execution unit logic circuitry further include a multiplier to perform the operation (a*(first vector input operand))+(b*(second vector operand))+c.