-
公开(公告)号:US20220100513A1
公开(公告)日:2022-03-31
申请号:US17134085
申请日:2020-12-24
Applicant: Intel Corporation
Inventor: CHRISTOPHER J. HUGHES , ALEXANDER HEINECKE , ROBERT VALENTINE , MENACHEM ADELMAN , EVANGELOS GEORGANAS , MARK CHARNEY
Abstract: Systems, methods, and apparatuses relating to one or more instructions that load data into a tile register and pad a row (or column) with a pad value from a padding circuit are described. In one embodiment, a system includes a matrix operations accelerator circuit comprising a two-dimensional grid of processing elements, a tile register that represents a two-dimensional matrix coupled to the matrix operations accelerator circuit, and a coupling to a memory, a padding circuit coupled to the tile register, and a hardware processor core including a decoder, of the hardware processor core coupled to the matrix operations accelerator circuit, to decode a single instruction into a decoded single instruction, the single instruction comprising a first field that identifies the tile register, a second field that identifies data elements in the memory, and an opcode, the opcode to indicate an execution circuit of the hardware processor core is to cause a load of the data elements from the memory into the tile register and the padding circuit to pad a proper subset of elements of the tile register with a same value, and the execution circuit of the hardware processor core to execute the decoded single instruction according to the opcode.
-
2.
公开(公告)号:US20220100507A1
公开(公告)日:2022-03-31
申请号:US17134046
申请日:2020-12-24
Applicant: Intel Corporation
Inventor: ALEXANDER F. HEINECKE , ROBERT VALENTINE , MARK J. CHARNEY , MENACHEM ADELMAN , CHRISTOPHER J. HUGHES , EVANGELOS GEORGANAS , ZEEV SPERBER , AMIT GRADSTEIN , SIMON RUBANOVICH
Abstract: Systems, methods, and apparatuses relating to instructions to convert 16-bit floating-point formats are described. In one embodiment, a processor includes fetch circuitry to fetch a single instruction having fields to specify an opcode and locations of a source vector comprising N plurality of 16-bit half-precision floating-point elements, and a destination vector to store N plurality of 16-bit bfloat floating-point elements, the opcode to indicate execution circuitry is to convert each of the elements of the source vector from 16-bit half-precision floating-point format to 16-bit bfloat floating-point format and store each converted element into a corresponding location of the destination vector, decode circuitry to decode the fetched single instruction into a decoded single instruction, and the execution circuitry to respond to the decoded single instruction as specified by the opcode.
-
公开(公告)号:US20220100502A1
公开(公告)日:2022-03-31
申请号:US17134008
申请日:2020-12-24
Applicant: Intel Corporation
Inventor: ALEXANDER F. HEINECKE , ROBERT VALENTINE , MARK J. CHARNEY , MENACHEM ADELMAN , CHRISTOPHER J. HUGHES , EVANGELOS GEORGANAS , ZEEV SPERBER , AMIT GRADSTEIN , SIMON RUBANOVICH
Abstract: Systems, methods, and apparatuses relating to 16-bit floating-point matrix dot product instructions are described. In one embodiment, a processor includes fetch circuitry to fetch a single instruction having fields to specify an opcode and locations of a M by N destination matrix having single-precision elements, an M by K first source matrix, and a K by N second source matrix, the source matrices having elements that each comprise a pair of half-precision floating-point values, the opcode to indicate execution circuitry is to cause, for each element of the first source matrix and corresponding element of the second source matrix, a conversion of the half-precision floating-point values to single-precision values, a multiplication of converted single-precision values from first values of the pairs together to generate a first result, a multiplication of converted single-precision values from second values of the pairs together to generate a second result, and an accumulation of the first result and the second result with previous contents of a corresponding element of the destination matrix, decode circuitry to decode the fetched instruction, and the execution circuitry to respond to the decoded instruction as specified by the opcode.
-
-