摘要:
Techniques for prefetching random data and instructions using implicitly reference random number data are described. An example includes decode circuitry to decode a single instruction at least having a field for an opcode, the opcode to indicate execution circuitry is to perform an operation using implicitly referenced random data; and execution circuitry to execute the decoded single instruction according to the opcode.
摘要:
Systems, methods, and apparatuses for vector bit test are described. In some embodiments, a vector bit test instruction is executed to shift each packed data element of a first source by a number of bits indicated by a corresponding packed data element of a second source, and store consecutive bit values from each packed data element of the first source at the identified bit positions of a corresponding packed data element of a destination.
摘要:
Embodiments detailed herein relate to matrix operations. For example, in some embodiments, a processor comprises decode circuitry to decode an instruction having fields for an opcode, an identifier for a first source matrix operand, an identifier of a second source matrix operand, and an identifier for a source/destination matrix operand, and execution circuitry to execute the decoded instruction to multiply the identified first source matrix operand by the identified second source matrix operand, add a result of the multiplication to the identified source/destination matrix operand, and store a result of the addition in the identified source/destination matrix operand.
摘要:
Embodiments detailed herein relate to matrix operations. For example, an apparatus comprises decode circuitry to decode an instruction and execution circuitry, coupled with the decode circuitry. The instruction has fields to indicate a first M row by K column (MxK) matrix, a second K row by N column (KxN) matrix, and a third M row by N column (MxN) matrix. The first MxK matrix has data elements of a first size, the second KxN matrix has data elements of the first size, and the third MxN matrix has data elements of a second size four times the first size. The execution circuitry performs operations corresponding to the instruction, including to: for each row of the first MxK matrix, and each column of the second KxN matrix: generate a dot-product from all data elements of the row of the first MxK matrix and all data elements of the column of the second KxN matrix, and accumulate the dot-product with a data element from a corresponding row and a corresponding column of the third MxN matrix.
摘要:
A processor includes a decoder to receive an instruction that indicates first and second source packed data operands and at least one shift count. An execution unit is operable, in response to the instruction, to store a result packed data operand. Each result data element includes a first least significant bit (LSB) portion of a first data element of a corresponding pair of data elements in a most significant bit (MSB) portion, and a second MSB portion of a second data element of the corresponding pair in a LSB portion. One of the first LSB portion of the first data element and the second MSB portion of the second data element has a corresponding shift count number of bits. The other has a number of bits equal to a size of a data element of the first source packed data minus the corresponding shift count.
摘要:
Embodiments detailed herein relate to matrix operations. For example, in some embodiments, a processor comprises decode circuitry to decode an instruction having fields for an opcode, a first source matrix operand identifier, a second source matrix operand identifier, and a destination matrix operand identifier, wherein each of the first source matrix operand, the second source matrix operand, and the destination matrix operand corresponds to a two-dimensional matrix of values, and execution circuitry to execute the decoded instruction to, for each data element position of the identified first source matrix operand: multiply a first data value at that data element position by a second data value at a corresponding data element position of the identified second source matrix operand, and store a result of the multiplication into a corresponding data element position of the identified destination matrix operand.
摘要:
One embodiment provides for a compute apparatus to perform machine learning operations, the apparatus comprising a decode unit to decode a single instruction into a decoded instruction that specifies multiple operands including an input value and a quantized weight value associated with a neural network and an arithmetic logic unit including a barrel shifter, an adder, and an accumulator register, wherein to execute the decoded instruction, the barrel shifter is to shift the input value by the quantized weight value to generate a shifted input value and the adder is to add the shifted input value to a value stored in the accumulator register and update the value stored in the accumulator register.
摘要:
A processor includes a decoder to receive an instruction that indicates first and second source packed data operands and at least one shift count. An execution unit is operable, in response to the instruction, to store a result packed data operand. Each result data element includes a first least significant bit (LSB) portion of a first data element of a corresponding pair of data elements in a most significant bit (MSB) portion, and a second MSB portion of a second data element of the corresponding pair in a LSB portion. One of the first LSB portion of the first data element and the second MSB portion of the second data element has a corresponding shift count number of bits. The other has a number of bits equal to a size of a data element of the first source packed data minus the corresponding shift count.