摘要:
A processor includes an execution unit, a fault mask coupled to the execution unit, and a suppress mask coupled to the execution unit. The fault mask is to store a first plurality of bit values to indicate which elements of a multi-element vector have an associated fault generated in response to execution of an instruction on the element in the execution unit. The suppress mask is to store a second plurality of bit values to indicate which of the elements are to have an associated fault suppressed. The processor also includes counter logic to increment a counter in response to an indication of a first fault associated with the first element and received from the fault mask, and an indication of a first suppression associated with the first element and received from the suppress mask. Other embodiments are described as claimed.
摘要:
Instructions and logic provide SIMD address conflict detection functionality. Some embodiments include processors with a register with a variable plurality of data fields, each of the data fields to store an offset for a data element in a memory. A destination register has corresponding data fields, each of these data fields to store a variable second plurality of bits to store a conflict mask having a mask bit for each offset. Responsive to decoding a vector conflict instruction, execution units compare the offset in each data field with every less significant data field to determine if they hold a matching offset, and in corresponding conflict masks in the destination register, set any mask bits corresponding to a less significant data field with a matching offset. Vector address conflict detection can be used with variable sized elements and to generate conflict masks to resolve dependencies in gather-modify-scatter SIMD operations.
摘要:
Instructions and logic provide SIMD address conflict resolution with vector population count functionality. Some embodiments include processors with a register with a variable plurality of data fields, each of the data fields to store a variable second plurality of bits. A destination register has corresponding data fields, each of these data fields to store a count of the number of bits set to one for corresponding data fields. Responsive to decoding a vector population count instruction, execution units count the number of bits set to one for each of data fields in the register, and store the counts in corresponding data fields of the first destination register. Vector population count instructions can be used with variable sized elements and conflict masks to generate iteration counts and completion masks to be used each iteration to resolve dependencies in gather-modify-scatter SIMD operations.
摘要:
Systems, methods, and apparatuses for data speculation execution (DSX) are described. In some embodiments, a hardware apparatus for performing DSX comprises a hardware decoder to decode an instruction, the instruction to include an opcode and an operand to store a portion of a fallback address and an operand to store a stride value, execution hardware to execute the decoded instruction to initiate a data speculative execution (DSX) region by activating DSX tracking hardware to track speculative memory accesses and detect ordering violations in the DSX region, and storing the fallback address.
摘要:
In several embodiments, vector extensions to an instruction set architecture include instructions to perform saturated signed and unsigned integer additions. In one embodiment, a vector signed integer add with signed saturation is provided. In one embodiment, a vector unsigned integer add with unsigned saturation is provided. In one embodiment, packed doubleword and quadword integers are supported for both signed and unsigned instructions.
摘要:
Detailed herein are systems, apparatuses, and methods for strided loads. In an embodiment, an apparatus includes a decoder to decode an instruction, wherein the instruction to include fields a starting source memory address operand and a starting destination register operand; and execution circuitry to execute the decoded instruction to extract data elements of a defined number of types from contiguous memory beginning at the starting source memory address and, for each type, store the extracted data elements in a packed data register dedicated to that type beginning with starting destination register operand.
摘要:
Embodiments of systems, apparatuses, and method for getting even or odd data elements are described. For example, in some embodiments, an apparatus includes a decoder to decode an instruction, wherein the instruction to include fields for a first source operand, a second source operand, and a destination operand; and execution circuitry to execute the decoded instruction to extract data elements from even data element positions of the first and second source operands and store the extracted data elements into the destination operand.
摘要:
An apparatus and method are described for down-converting from a source operand to a destination operand with masking. For example, a method according to one embodiment includes the following operations: reading a source operand value to be down-converted from a first value to a down-converted value and stored in a destination location; reading each mask register bit stored in a mask register, the mask register bit(s) indicating whether to perform a masking operation or a conversion operation on the source operand value; if the mask register bit(s) indicates that a masking operation is to be performed, then performing a specified masking operation and storing the results of the masking operation in the destination location; and if the mask register bit indicates that a masking operation is not to be performed, then down-converting the source operand value and storing the down-converted value in the specified destination location.
摘要:
A method of an aspect includes receiving an instruction. The instruction indicates an integer stride, indicates an integer offset, and indicates a destination storage location. A result is stored in the destination storage location in response to the instruction. The result includes a sequence of at least four integers in numerical order with a smallest one of the at least four integers differing from zero by the integer offset and with all integers of the sequence in consecutive positions differing by the integer stride. Other methods, apparatus, systems, and instructions are disclosed.
摘要:
Embodiments of systems, apparatuses, and methods for performing in a computer processor vector packed horizontal partial sum of packed data elements in response to a single vector packed horizontal sum instruction that includes a destination vector register operand, a source vector register operand, and an opcode are described.