摘要:
Two or more pointers, each of which indicates where values of a respective group of bits of a source of a particular micro-operation will be found when the particular micro-operation is executed, may not all point to the same register. Renaming of the source of the particular micro-operation may be enabled by generating one or more new micro-operations that merge the values into a single register. The one or more new micro-operations are inserted into a sequence of micro-operations that includes the particular micro-operation. Once the source of the particular micro-operation has been renamed, subsequent micro-operations in the sequence may be renamed, if appropriate, and executed, without having to wait for the values to be calculated.
摘要:
A method and apparatus for a two micro-operation flow using source override. In one embodiment, the method includes the identification of a macro-instruction having one or more streaming single instruction multiple data extension type operands. Once received, the macro-instruction is decoded into a first micro-operation (uOP) and a second uOP. Once decoded, a signal is asserted to disable source operand override logic if the first micro-operation updates a logical destination register that matches a logical source register of the micro-operation. Otherwise, the mutual source override is active and executed by a register alias table (RAT) when uOP with matching logic source and destination register are detected in a same clock cycle. In doing so, macro-instructions having 128-bit operands may be processed using, for example, two uOPs (one for the lower half and one for the upper half) in a 64-bit implementation, while preserving the atomicity of the original instruction.
摘要:
A method and apparatus for a two micro-operation flow using source override. In one embodiment, the method includes the identification of a macro-instruction having one or more streaming single instruction multiple data extension type operands. Once received, the macro-instruction is decoded into a first micro-operation (uOP) and a second uOP. Once decoded, a signal is asserted to disable source operand override logic if the first micro-operation updates a logical destination register that matches a logical source register of the micro-operation. Otherwise, the mutual source override is active and executed by a register alias table (RAT) when uOP with matching logic source and destination register are detected in a same clock cycle. In doing so, macro-instructions having 128-bit operands may be processed using, for example, two uOPs (one for the lower half and one for the upper half) in a 64-bit implementation, while preserving the atomicity of the original instruction.
摘要:
A method of an aspect includes receiving a packed data rearrangement control indexes generation instruction. The packed data rearrangement control indexes generation instruction indicates a destination storage location. A result is stored in the destination storage location in response to the packed data rearrangement control indexes generation instruction. The result includes a sequence of at least four non-negative integers representing packed data rearrangement control indexes. In an aspect, values of the at least four non-negative integers are not calculated using a result of a preceding instruction. Other methods, apparatus, systems, and instructions are disclosed.
摘要:
An apparatus and method are described for permuting data elements with masking. For example, a method according to one embodiment includes the following operations: reading values from a mask data structure to determine whether masking is implemented for each data element of a destination operand; if masking not implemented for a particular data element, then selecting data elements from the destination operand and a second source operand based on index values stored in a first source operand to be copied to data element positions within the destination operand, wherein any one of the data elements from either the destination operand and the second source operand may be copied to any one of the data element positions within the destination operand; if masking is implemented for a particular data element of the destination operand, then performing a designated masking operation with respect to that particular data element.
摘要:
When legacy instructions, that can only operate on smaller registers, are mixed with new instructions in a processor with larger registers, special handling and architecture are used to prevent the legacy instructions from causing problems with the data in the upper portion of the registers, i.e., the portion that they cannot directly access. In some embodiments, the upper portion of the registers are saved to temporary storage while the legacy instructions are operating, and restored to the upper portion of the registers when the new instructions are operating. A special instruction may also be used to disable this save/restore operation if the new instruction are not going to use the upper part of the registers.
摘要:
Embodiments of systems, apparatuses, and methods for performing in a computer processor vector double block packed sum of absolute differences (SAD) in response to a single vector double block packed sum of absolute differences instruction that includes a destination vector register operand, first and second source operands, an immediate, and an opcode are described.
摘要:
An apparatus is described having an instruction execution pipeline that has a vector functional unit to support a vector multiply add instruction. The vector multiply add instruction to multiply respective K bit elements of two vectors and accumulate a portion of each of their respective products with another respective input operand in an X bit accumulator, where X is greater than K.
摘要:
An apparatus is described having instruction execution logic circuitry to execute first, second, third and fourth instruction. Both the first instruction and the second instruction insert a first group of input vector elements to one of multiple first non overlapping sections of respective first and second resultant vectors. The first group has a first bit width. Each of the multiple first non overlapping sections have a same bit width as the first group. Both the third instruction and the fourth instruction insert a second group of input vector elements to one of multiple second non overlapping sections of respective third and fourth resultant vectors. The second group has a second bit width that is larger than said first bit width. Each of the multiple second non overlapping sections have a same bit width as the second group. The apparatus also includes masking layer circuitry to mask the first and third instructions at a first resultant vector granularity, and, mask the second and fourth instructions at a second resultant vector granularity.
摘要:
A method of an aspect includes receiving an instruction indicating a destination storage location. A result is stored in the destination storage location in response to the instruction. The result includes a sequence of at least four non-negative integers in numerical order with all integers in consecutive positions differing by a constant stride of at least two. In an aspect, storing the result including the sequence of the at least four integers is performed without calculating the at least four integers using a result of a preceding instruction. Other methods, apparatus, systems, and instructions are disclosed.