摘要:
A technique for decoding an instruction in an a variable-length instruction set. In one embodiment, an instruction encoding is described, in which legacy, present, and future instruction set extensions are supported, and increased functionality is provided, without expanding the code size and, in some cases, reducing the code size.
摘要:
A technique for decoding an instruction in a variable-length instruction set. In one embodiment, an instruction encoding is described, in which legacy, present, and future instruction set extensions are supported, and increased functionality is provided, without expanding the code size and, in some cases, reducing the code size.
摘要:
A system, techniques and apparatus are described for decoding an instruction in an a variable-length instruction set. An instruction encoding is described, in which legacy, present, and future instruction set extensions are supported, and increased functionality is provided, without expanding the code size and, in some cases, reducing the code size.
摘要:
An apparatus and method for efficient matrix alignment in a systolic array. For example, one embodiment of a processor comprises: a first set of physical tile registers to store first matrix data in rows or columns; a second set of physical tile registers to store second matrix data in rows or columns; a decoder to decode a matrix instruction identifying a first input matrix, a first offset, a second input matrix, and a second offset; and execution circuitry, responsive to the matrix instruction, to read a subset of rows or columns from the first set of physical tile registers in accordance with the first offset, spanning multiple physical tile registers from the first set if indicated by the first offset to generate a first input matrix and the execution circuitry to read a subset of rows or columns from the second set of physical tile registers in accordance with the second offset, spanning multiple physical tile registers from the second set if indicated by the second offset to generate a second input matrix; and the execution circuitry to perform an arithmetic operation with the first and second input matrices in accordance with an opcode of the matrix instruction.
摘要:
A method of an aspect includes receiving a floating point rounding instruction. The floating point rounding instruction indicates a source of one or more floating point data elements, indicates a number of fraction bits after a radix point that each of the one or more floating point data elements are to be rounded to, and indicates a destination storage location. A result is stored in the destination storage location in response to the floating point rounding instruction. The result includes one or more rounded result floating point data elements. Each of the one or more rounded result floating point data elements includes one of the floating point data elements of the source, in a corresponding position, which has been rounded to the indicated number of fraction bits. Other methods, apparatus, systems, and instructions are disclosed.
摘要:
A technique for decoding an instruction in a variable-length instruction set. In one embodiment, an instruction encoding is described, in which legacy, present, and future instruction set extensions are supported, and increased functionality is provided, without expanding the code size and, in some cases, reducing the code size.
摘要:
A method of an aspect includes receiving a floating point rounding instruction. The floating point rounding instruction indicates a source of one or more floating point data elements, indicates a number of fraction bits after a radix point that each of the one or more floating point data elements are to be rounded to, and indicates a destination storage location. A result is stored in the destination storage location in response to the floating point rounding instruction. The result includes one or more rounded result floating point data elements. Each of the one or more rounded result floating point data elements includes one of the floating point data elements of the source, in a corresponding position, which has been rounded to the indicated number of fraction bits. Other methods, apparatus, systems, and instructions are disclosed.
摘要:
A method includes maintaining an indication of a pending event with respect to each of a number of threads supported within a multithreaded processor. An indication is also maintained of an active or inactive state for each of the multiple threads. A clock disable condition is detected. This clock disable condition may be indicated by the absence of pending events with respect to each of the multiple threads and an inactive state for each of the multiple threads. A clocks signal, if enabled, is then disabled with respect to at least one functional unit within the multithreaded processor responsive to the detection of the clock disable condition.
摘要:
Embodiments of systems, apparatuses, and methods for performing in a computer processor a data element shuffle and an operation on the shuffled data elements in response to a single data element shuffle and an operation instruction that includes a destination vector register operand, a first and second source vector register operands, an immediate value, and an opcode are described.
摘要:
A method includes maintaining an indication of a pending event with respect to each of a number of threads supported within a multithreaded processor. An indication is also maintained of an active or inactive state for each of the multiple threads. A clock disable condition is detected. This clock disable condition may be indicated by the absence of pending events with respect to each of the multiple threads and an inactive state for each of the multiple threads. A clocks signal, if enabled, is then disabled with respect to at least one functional unit within the multithreaded processor responsive to the detection of the clock disable condition.