Abstract:
An apparatus includes an instruction decoder, first and second source registers and a circuit coupled to the decoder to receive packed data from the source registers and to unpack the packed data responsive to an unpack instruction received by the decoder. A first packed data element and a third packed data element are received from the first source register. A second packed data element and a fourth packed data element are received from the second source register. The circuit copies the packed data elements into a destination register resulting with the second packed data element adjacent to the first packed data element, the third packed data element adjacent to the second packed data element, and the fourth packed data element adjacent to the third packed data element.
Abstract:
A method and apparatus for including in a processor instructions for performing multiply-add operations on packed data. In one embodiment, a processor is coupled to a memory. The memory has stored therein a first packed data and a second packed data. The processor performs operations on data elements in said first packed data and said second packed data to generate a third packed data in response to receiving an instruction. At least two of the data elements in this third packed data storing the result of performing multiply-add operations on data elements in the first and second packed data.
Abstract:
A method and apparatus for including in a processor instructions for performing multiply-add operations on packed data. In one embodiment, a processor is coupled to a memory. The memory has stored therein a first packed data and a second packed data. The processor performs operations on data elements in said first packed data and said second packed data to generate a third packed data in response to receiving an instruction. At least two of the data elements in this third packed data storing the result of performing multiply-add operations on data elements in the first and second packed data.
Abstract:
An apparatus includes an instruction decoder, first and second source registers and a circuit coupled to the decoder to receive packed data from the source registers and to unpack the packed data responsive to an unpack instruction received by the decoder. A first packed data element and a third packed data element are received from the first source register. A second packed data element and a fourth packed data element are received from the second source register. The circuit copies the packed data elements into a destination register resulting with the second packed data element adjacent to the first packed data element, the third packed data element adjacent to the second packed data element, and the fourth packed data element adjacent to the third packed data element.
Abstract:
An apparatus includes an instruction decoder, first and second source registers and a circuit coupled to the decoder to receive packed data from the source registers and to unpack the packed data responsive to an unpack instruction received by the decoder. A first packed data element and a third packed data element are received from the first source register. A second packed data element and a fourth packed data element are received from the second source register. The circuit copies the packed data elements into a destination register resulting with the second packed data element adjacent to the first packed data element, the third packed data element adjacent to the second packed data element, and the fourth packed data element adjacent to the third packed data element.
Abstract:
An apparatus includes an instruction decoder, first and second source registers and a circuit coupled to the decoder to receive packed data from the source registers and to unpack the packed data responsive to an unpack instruction received by the decoder. A first packed data element and a third packed data element are received from the first source register. A second packed data element and a fourth packed data element are received from the second source register. The circuit copies the packed data elements into a destination register resulting with the second packed data element adjacent to the first packed data element, the third packed data element adjacent to the second packed data element, and the fourth packed data element adjacent to the third packed data element.
Abstract:
An apparatus includes an instruction decoder, first and second source registers and a circuit coupled to the decoder to receive packed data from the source registers and to unpack the packed data responsive to an unpack instruction received by the decoder. A first packed data element and a third packed data element are received from the first source register. A second packed data element and a fourth packed data element are received from the second source register. The circuit copies the packed data elements into a destination register resulting with the second packed data element adjacent to the first packed data element, the third packed data element adjacent to the second packed data element, and the fourth packed data element adjacent to the third packed data element.
Abstract:
A processor comprising a register file, and a decoder to decode an instruction to specify a first source register having a first packed signed 16-bit integers, and to specify a second source register having a second packed signed 16-bit integers. A functional unit to generate a result to be stored in a specified destination. The result including a third packed 8-bit integers including an integer for each integer in the first packed integers, and an integer for each integer in the second packed integers. The integers corresponding to the first packed integers next to one another in the result. The integers corresponding to the second packed integers next to one another. A highest order integer of the result corresponding to a highest order integer of the first packed integers. A lowest order integer of the result corresponding to a lowest order integer of the second packed integers.
Abstract:
A system and method for producing a fused instruction is described. In one embodiment, a first instruction and a second instruction that are both simple instructions (e.g., perform only one operation) and are dependent are fused together to create the fused instruction. The fused instruction has an opcode that represents the operation performed by the first instruction and the operation performed by the second instruction. The fused instruction has three source operands and one destination operand. Two of the three source operands are the two source operands of the first instruction, and the third source operand is the source operand of the second instruction that is not the destination operand of the first instruction. The destination operand of the fused instruction is the destination operand of the second instruction. An execution unit that can execute a fused instruction in one clock cycle is also disclosed. In one embodiment, the execution unit has two arithmetic logic units (“ALUs”), each of the ALUs performs one of the two operations of the fused instruction. The result of the first ALU is input into the second ALU to produce the desired result.
Abstract:
According to an embodiment of the present invention, a microprocessor includes an expanded logical register set that can be accessed by instructions including legacy opcodes and remapped addressing mode information. The known IA-32 instruction set is limited to accessing eight logical general integer registers. An IA-32 instruction can specify which of the eight logical general integer registers are to be accessed via 3-bit register identifier fields of the addressing mode information of the instruction. Each 3-bit register identifier can specify any of the eight logical general integer registers. An expanded logical register set (e.g., sixteen logical registers, thirty-two logical registers, sixty-four logical registers, etc.) can be accessed by remapping the addressing mode information to include at least four-bit register identifiers without defining new opcodes or defining additional instruction prefixes.