Abstract:
Embodiments of systems, apparatuses, and methods for performing a blend instruction in a computer processor are described. In some embodiments, the execution of a blend instruction causes a data element-by-element selection of data elements of first and second source operands using the corresponding bit positions of a writemask as a selector between the first and second operands and storage of the selected data elements into the destination at the corresponding position in the destination.
Abstract:
Embodiments of systems, apparatuses, and methods for performing gather and scatter stride instruction in a computer processor are described. In some embodiments, the execution of a gather stride instruction causes a conditionally storage of strided data elements from memory into the destination register according to at least some of bit values of a writemask
Abstract:
An apparatus includes a decode unit to decode a permute instruction and a vector conflict instruction. A vector execution unit is coupled with the decode unit and includes a fully-connected interconnect. The fully-connected interconnect has at least four inputs to receive at least four corresponding data elements of at least one source vector. The fully-connected interconnect has at least four outputs. Each of the at least four inputs is coupled with each of the at least four outputs. The execution unit also includes a permute instruction execution logic coupled with the at least four outputs and operable to store a first vector result in response to the permute instruction. The execution unit also includes a vector conflict instruction execution logic coupled with the at least four outputs and operable to store a second vector result in a destination storage location in response to the vector conflict instruction.
Abstract:
Embodiments of systems, apparatuses, and methods for performing an expand and/or compress instruction in a computer processor are described. In some embodiments, the execution of an expand instruction causes the selection of elements from a source that are to be sparsely stored in a destination based on values of the writemask and store each selected data element of the source as a sparse data element into a destination location, wherein the destination locations correspond to each writemask bit position that indicates that the corresponding data element of the source is to be stored.
Abstract:
A vector friendly instruction format and execution thereof. According to one embodiment of the invention, a processor is configured to execute an instruction set. The instruction set includes a vector friendly instruction format. The vector friendly instruction format has a plurality of fields including a base operation field, a modifier field, an augmentation operation field, and a data element width field, wherein the first instruction format supports different versions of base operations and different augmentation operations through placement of different values in the base operation field, the modifier field, the alpha field, the beta field, and the data element width field, and wherein only one of the different values may be placed in each of the base operation field, the modifier field, the alpha field, the beta field, and the data element width field on each occurrence of an instruction in the first instruction format in instruction streams.
Abstract:
Embodiments of systems, apparatuses, and methods for performing a jump instruction in a computer processor are described. In some embodiments, the execution of a blend instruction causes a conditional jump to an address of a target instruction when all of bits of a writemask are zero, wherein the address of the target instruction is calculated using an instruction pointer of the instruction and the relative offset.