摘要:
A method and apparatus for performing a move mask operation. The present invention provides a method and apparatus for performing operations on packed data values of a first size and format and conversion of the results to data of a second size and format by eliminating redundant data. The present invention is useful, for example, when comparisons are performed on floating point data that is typically larger (e.g., 64 bits) than integer data (e.g., 32 bits) and integer operations are preformed based on the result. Because many processors branch based on integer data, the comparison results stored as floating point data must be transferred to an integer register prior to branching. The present invention takes advantage of redundancy of the floating point comparison results to transfer enough data to convey the comparison result to integer registers with a single instruction.
摘要:
A method and instruction for converting a number between a floating point format and an integer format are described. Numbers are stored in the integer format in a register of a first set of architectural registers in a scalar format. At least one of the numbers in the scalar format is converted to a number in the floating point format. The number in the floating point format is placed in a register of a second set of architectural registers in a packed format.
摘要:
A method and instruction for converting a number from a floating point format to an integer format are described. Numbers are stored in the floating point format in a register of a first set of architectural registers in a packed format. At least one of the numbers in the floating point format is converted to at least one 16-bit number in the integer format. The 16-bit number in the integer format is placed in a register of a second set of architectural registers in the packed format.
摘要:
A method and apparatus for executing partial-width packed data instructions are discussed. The processor may include a plurality of registers, a register renaming unit, a decoder, and a partial-width execution unit. The register renaming unit provides an architectural register file to store packed data operands each of which include a plurality of data elements. The decoder is to decode a first and second set of instructions that each specify one or more registers in the architectural register file. The first set of instructions specify operations to be performed on all of the data elements stored in the one or more specified registers. In contrast, the second set of instructions specify operations to be performed on only a subset of the data elements. The partial-width execution unit is to execute operations specified by either of the first or the second set of instructions.
摘要:
A method and apparatus are disclosed for staggering execution of an instruction. According to one embodiment of the invention, a single macro instruction is received wherein the single macro instruction specifies at least two logical registers and wherein the two logical registers respectively store a first and second packed data operands having corresponding data elements. An operation specified by the single macro instruction is then performed independently on a first and second plurality of the corresponding data elements from said first and second packed data operands at different times using the same circuit to independently generate a first and second plurality of resulting data elements. The first and second plurality of resulting data elements are stored in a single logical register as a third packed data operand.
摘要:
A method and apparatus for parallel processing of graphics data are described. A number of color components are stored in a floating point format in at least one register of a set of 128-bit registers in a packed format. The color components in the floating point format are converted to numbers in an integer format. The numbers in the integer format are placed in at least one register of a set of 64-bit registers in the packed format. Color components are assembled for image pixels from the numbers in the integer format.
摘要:
A method and apparatus are disclosed for staggering execution of an instruction. According to one embodiment of the invention, a single macro instruction is received wherein the single macro instruction specifies at least two logical registers and wherein the two logical registers respectively store a first and second packed data operands having corresponding data elements. An operation specified by the single macro instruction is then performed independently on a first and second plurality of the corresponding data elements from said first and second packed data operands at different times using the same circuit to independently generate a first and second plurality of resulting data elements. The first and second plurality of resulting data elements are stored in a single logical register as a third packed data operand.
摘要:
A method and instruction for converting a number between a floating point format and an integer format are described. Numbers are stored in the integer format in a register of a first set of architectural registers in a packed format. At least one of the numbers in the integer format is converted to at least one number in the floating point format. The numbers in the floating point format are placed in a register of a second set of architectural registers in a packed format.
摘要:
A method and instruction for converting a number from a floating point format to an integer format are described. Numbers are stored in the floating point format in a register of a first set of architectural registers in a packed format. At least one of the numbers in the floating point format is converted to at least one 8-bit number in the integer format. The 8-bit number in the integer format is placed in a register of a second set of architectural registers in the packed format.
摘要:
A method and apparatus are disclosed for staggering execution of an instruction. According to one embodiment of the invention, a single macro instruction is received wherein the single macro instruction specifies at least two logical registers and wherein the two logical registers respectively store a first and second packed data operands having corresponding data elements. An operation specified by the single macro instruction is then performed independently on a first and second plurality of the corresponding data elements from said first and second packed data operands at different times using the same circuit to independently generate a first and second plurality of resulting data elements. The first and second plurality of resulting data elements are stored in a single logical register as a third packed data operand.