摘要:
An apparatus and method are described for permuting data elements with masking. For example, a method according to one embodiment includes the following operations: reading values from a mask data structure to determine whether masking is implemented for each data element of a destination operand; if masking not implemented for a particular data element, then selecting data elements from a first source operand and a second source operand based on index values stored in destination operand to be copied to data element positions within the destination operand, wherein any one of the data elements from either the first source operand and the second source operand may be copied to any one of the data element positions within the destination operand; and if masking is implemented for a particular data element of the destination operand, then performing a designated masking operation with respect to that particular data element.
摘要:
An apparatus is described that includes an execution unit to execute a first instruction and a second instruction. The execution unit includes input register space to store a first data structure to be replicated when executing the first instruction and to store a second data structure to be replicated when executing the second instruction. The first and second data structures are both packed data structures. Data values of the first packed data structure are twice as large as data values of the second packed data structure. The execution unit also includes replication logic circuitry to replicate the first data structure when executing the first instruction to create a first replication data structure, and, to replicate the second data structure when executing the second data instruction to create a second replication data structure. The execution unit also includes masking logic circuitry to mask the first replication data structure at a first granularity and mask the second replication data structure at a second granularity. The second granularity is twice as fine as the first granularity.
摘要:
An apparatus is described that includes an execution unit to execute a first instruction and a second instruction. The execution unit includes input register space to store a first data structure to be replicated when executing the first instruction and to store a second data structure to be replicated when executing the second instruction. The first and second data structures are both packed data structures. Data values of the first packed data structure are twice as large as data values of the second packed data structure. The first data structure is four times as large as the second data structure. The execution unit also includes replication logic circuitry to replicate the first data structure when executing the first instruction to create a first replication data structure, and, to replicate the second data structure when executing the second instruction to create a second replication data structure.
摘要:
An apparatus and method are described for permuting data elements with masking. For example, a method according to one embodiment includes the following operations: reading values from a mask data structure to determine whether masking is implemented for each data element of a destination operand; if masking not implemented for a particular data element, then selecting data elements from a first source operand and a second source operand based on index values stored in destination operand to be copied to data element positions within the destination operand, wherein any one of the data elements from either the first source operand and the second source operand may be copied to any one of the data element positions within the destination operand; and if masking is implemented for a particular data element of the destination operand, then performing a designated masking operation with respect to that particular data element.
摘要:
Embodiments of systems, apparatuses, and methods for performing in a computer processor vector packed horizontal partial sum of packed data elements in response to a single vector packed horizontal sum instruction that includes a destination vector register operand, a source vector register operand, and an opcode are described.
摘要:
Embodiments of systems, apparatuses, and methods for performing in a computer processor conversion of a mask register into a vector register in response to a single vector packed convert a mask register to a vector register instruction that includes a destination vector register operand, a source writemask register operand, and an opcode are described.
摘要:
Embodiments of systems, apparatuses, and methods for performing in a computer processor absolute difference calculation in response to a single vector packed absolute difference instruction that includes a first and second source vector register operand, a destination vector register operand, and an opcode are described.
摘要:
Embodiments of systems, apparatuses, and methods for performing in a computer processor mask extraction from a general purpose register in response to a single mask extraction from a general purpose register instruction that includes a source general purpose register operand, a destination writemask register operand, an immediate value, and an opcode are described.
摘要:
An apparatus is described that includes an execution unit to execute a first instruction and a second instruction. The execution unit includes input register space to store a first data structure to be replicated when executing the first instruction and to store a second data structure to be replicated when executing the second instruction. The first and second data structures are both packed data structures. Data values of the first packed data structure are twice as large as data values of the second packed data structure. The execution unit also includes replication logic circuitry to replicate the first data structure when executing the first instruction to create a first replication data structure, and, to replicate the second data structure when executing the second data instruction to create a second replication data structure. The execution unit also includes masking logic circuitry to mask the first replication data structure at a first granularity and mask the second replication data structure at a second granularity. The second granularity is twice as fine as the first granularity.
摘要:
A processing core implemented on a semiconductor chip is described. The processing core includes logic circuitry to identify whether vector instructions and integer scalar instructions are to be executed with two registers or three registers, where, in the case of two registers input operand information is destroyed in one of two registers, and, in the case of three registers input operand is not destroyed. The processing core also includes steering circuitry coupled to the logic circuitry. The steering circuitry is to control first data paths between scalar integer execution units and a scalar integer register bank such that two registers are accessed from the scalar register bank if two register execution is identified for the scalar integer instructions or three registers are accessed from the scalar integer register bank if three register execution is identified for the scalar integer instructions. The steering circuitry is also to control second data paths between vector execution units and a vector register bank such that two registers are accessed from the vector register bank if two register execution is identified for the vector instructions or three registers are accessed from the vector register bank if three register execution is identified for the vector instructions.