-
公开(公告)号:US20210349720A1
公开(公告)日:2021-11-11
申请号:US17382917
申请日:2021-07-22
Applicant: Intel Corporation
Inventor: Robert VALENTINE , Zeev SPERBER , Mark J. CHARNEY , Bret L. TOLL , Rinat RAPPOPORT , Stanislav SHWARTSMAN , Dan BAUM , Igor YANOVER , Elmoustapha OULD-AHMED-VALL , Menachem ADELMAN , Jesus CORBAL , Yuri GEBIL , Simon RUBANOVICH
Abstract: Embodiments detailed herein relate to matrix operations. In particular, matrix (tile) multiply accumulate and negated matrix (tile) multiply accumulate are discussed. For example, in some embodiments decode circuitry to decode an instruction having fields for an opcode, an identifier for a first source matrix operand, an identifier of a second source matrix operand, and an identifier for a source/destination matrix operand; and execution circuitry to execute the decoded instruction to multiply the identified first source matrix operand by the identified second source matrix operand, add a result of the multiplication to the identified source/destination matrix operand, and store a result of the addition in the identified source/destination matrix operand and zero unconfigured columns of identified source/destination matrix operand are detailed.
-
公开(公告)号:US20200233667A1
公开(公告)日:2020-07-23
申请号:US16487787
申请日:2017-07-01
Applicant: Intel Corporation
Inventor: Robert VALENTINE , Zeev SPERBER , Mark J. CHARNEY , Bret L. TOLL , Rinat RAPPOPORT , Stanislav SHWARTSMAN , Dan BAUM , Igor YANOVER , Elmoustapha OULD-AHMED-VALL , Menachem ADELMAN , Jesus CORBAL , Yuri GEBIL , Simon RUBANOVICH
Abstract: Embodiments detailed herein relate to matrix operations. In particular, matrix (tile) multiply accumulate and negated matrix (tile) multiply accumulate are discussed. For example, in some embodiments decode circuitry to decode an instruction having fields for an opcode, an identifier for a first source matrix operand, an identifier of a second source matrix operand, and an identifier for a source/destination matrix operand; and execution circuitry to execute the decoded instruction to multiply the identified first source matrix operand by the identified second source matrix operand, add a result of the multiplication to the identified source/destination matrix operand, and store a result of the addition in the identified source/destination matrix operand and zero unconfigured columns of identified source/destination matrix operand are detailed.
-
3.
公开(公告)号:US20220414182A1
公开(公告)日:2022-12-29
申请号:US17359519
申请日:2021-06-26
Applicant: Intel Corporation
Inventor: Menachem ADELMAN , Robert VALENTINE , Zeev SPERBER , Amit GRADSTEIN , Simon RUBANOVICH , Sagi MELLER , Christopher HUGHES , Evangelos GEORGANAS , Alexander HEINECKE , Mark CHARNEY
Abstract: Techniques for matrix multiplication are described. In some examples, decode circuitry is to decode a single instruction having fields for an opcode, an indication of a location of a first source operand, an indication of a location of a second source operand, and an indication of a location of a destination operand, wherein the opcode is to indicate that execution circuitry is to at least convert data elements of the first and second source operands from a first floating point representation to a second floating point representation, perform matrix multiplication with the converted data elements, and accumulate results of the matrix multiplication in the destination operand in the first floating point representation; and the execution circuitry is to execute to the decoded instruction as specified by the opcode.
-
4.
公开(公告)号:US20220326948A1
公开(公告)日:2022-10-13
申请号:US17851468
申请日:2022-06-28
Applicant: Intel Corporation
Inventor: Alexander F. HEINECKE , Robert VALENTINE , Mark J. CHARNEY , Raanan SADE , Menachem ADELMAN , Zeev SPERBER , Amit GRADSTEIN , Simon RUBANOVICH
Abstract: Disclosed embodiments relate to systems and methods for performing instructions to convert to 16-bit floating-point format. In one example, a processor includes fetch circuitry to fetch an instruction having fields to specify an opcode and locations of a first source vector comprising N single-precision elements, and a destination vector comprising at least N 16-bit floating-point elements, the opcode to indicate execution circuitry is to convert each of the elements of the specified source vector to 16-bit floating-point, the conversion to include truncation and rounding, as necessary, and to store each converted element into a corresponding location of the specified destination vector, decode circuitry to decode the fetched instruction, and execution circuitry to respond to the decoded instruction as specified by the opcode.
-
公开(公告)号:US20210286620A1
公开(公告)日:2021-09-16
申请号:US17216566
申请日:2021-03-29
Applicant: Intel Corporation
Inventor: Alexander F. HEINECKE , Robert VALENTINE , Mark J. CHARNEY , Raanan SADE , Menachem ADELMAN , Zeev SPERBER , Amit GRADSTEIN , Simon RUBANOVICH
Abstract: Disclosed embodiments relate to computing dot products of nibbles in tile operands. In one example, a processor includes decode circuitry to decode a tile dot product instruction having fields for an opcode, a destination identifier to identify a M by N destination matrix, a first source identifier to identify a M by K first source matrix, and a second source identifier to identify a K by N second source matrix, each of the matrices containing doubleword elements, and execution circuitry to execute the decoded instruction to perform a flow K times for each element (m, n) of the specified destination matrix to generate eight products by multiplying each nibble of a doubleword element (M,K) of the specified first source matrix by a corresponding nibble of a doubleword element (K,N) of the specified second source matrix, and to accumulate and saturate the eight products with previous contents of the doubleword element.
-
公开(公告)号:US20200310793A1
公开(公告)日:2020-10-01
申请号:US16369743
申请日:2019-03-29
Applicant: Intel Corporation
Inventor: Simon RUBANOVICH , Amit GRADSTEIN , Zeev SPERBER
IPC: G06F9/30
Abstract: Disclosed embodiments relate to an interleaved pipeline of floating-point (FP) adders. In one example, a processor is to execute an instruction specifying an opcode and locations of a M by K first source matrix, a K by N second source matrix, and a M by N destination matrix, the opcode indicating execution circuitry, for each FP element (M, N) of the destination matrix, is to: launch K instances of a pipeline having a first, MULTIPLY stage, during which a FP element (M, K) of the first source matrix and a corresponding FP element (K, N) of the second source matrix are multiplied; concurrently, in an EXPDIFF stage, determine an exponent difference between the product and a previous FP value of the element (M, N) of the destination matrix; and in a second, ADD-BYPASS stage, accumulate the product with the previous FP value and, concurrently, bypassing the accumulated sum to a subsequent pipeline instance.
-
公开(公告)号:US20200310757A1
公开(公告)日:2020-10-01
申请号:US16369629
申请日:2019-03-29
Applicant: Intel Corporation
Inventor: Amit GRADSTEIN , Simon RUBANOVICH , Zeev SPERBER
Abstract: Disclosed embodiments relate to performing floating-point (FP) arithmetic. In one example, a processor is to decode an instruction specifying locations of first, second, and third floating-point (FP) operands and an opcode calling for accumulating a FP product of the first and second FP operands with the third FP operand, and execution circuitry to, in a first cycle, generate the FP product having a Fuzzy-Jbit format comprising a sign bit, a 9-bit exponent, and a 25-bit mantissa having two possible positions for a JBit and, in a second cycle, to accumulate the FP product with the third FP operand, while concurrently, based on Jbit positions of the FP product and the third FP operand, determining an exponent adjustment and a mantissa shift control of a result of the accumulation, wherein performing the exponent adjustment concurrently enhances an ability to perform the accumulation in one cycle.
-
公开(公告)号:US20240078283A1
公开(公告)日:2024-03-07
申请号:US18360793
申请日:2023-07-27
Applicant: Intel Corporation
Inventor: Amit GRADSTEIN , Simon RUBANOVICH , Sagi MELLER , Saeed KHAROUF , Gavri BERGER , Zeev SPERBER , Jose YALLOUZ , Ron SCHNEIDER
CPC classification number: G06F17/16 , G06F9/3001 , G06F9/30036 , G06F9/3851 , G06F9/34
Abstract: Systems, methods, and apparatuses relating to a matrix operations accelerator are described. In one embodiment, a processor includes a matrix operations accelerator circuit that includes a two-dimensional grid of fused multiply accumulate circuits that is switchable to a scheduling mode for execution of a decoded single instruction where the matrix operations accelerator circuit loads a first buffer of the two-dimensional grid of fused multiply accumulate circuits from a first plurality of registers that represents a first input two-dimensional matrix, checks if a second buffer of the two-dimensional grid of fused multiply accumulate circuits stores an immediately prior input two-dimension matrix that is the same as a second input two-dimensional matrix from a second plurality of registers that represents the first input two-dimensional matrix, and when the second buffer of the two-dimensional grid of fused multiply accumulate circuits stores the immediately prior input two-dimension matrix, from execution of a previous instruction, that is the same as the second input two-dimensional matrix: prevents reclamation of the second buffer between execution of the previous instruction and the decoded single instruction, performs an operation on the first input two-dimensional matrix from the first buffer and the immediately prior input two-dimension matrix from the second buffer to produce a resultant, and stores the resultant in resultant storage, and when the second buffer of the two-dimensional grid of fused multiply accumulate circuits does not store the immediately prior input two-dimension matrix, from execution of the previous instruction, that is the same as the second input two-dimensional matrix: loads the second input two-dimensional matrix into the second buffer of the two-dimensional grid of fused multiply accumulate circuits, performs the operation on the first input two-dimensional matrix from the first buffer and the second input two-dimension matrix from the second buffer to produce a resultant, and stores the resultant in the resultant storage.
-
公开(公告)号:US20220058021A1
公开(公告)日:2022-02-24
申请号:US17516023
申请日:2021-11-01
Applicant: Intel Corporation
Inventor: Robert VALENTINE , Dan BAUM , Zeev SPERBER , Jesus CORBAL , Elmoustapha OULD-AHMED-VALL , Bret L. TOLL , Mark J. CHARNEY , Menachem ADELMAN , Barukh ZIV , Alexander HEINECKE , Simon RUBANOVICH
Abstract: Embodiments detailed herein relate to matrix operations. For example, embodiments of instruction support for matrix (tile) dot product operations are detailed. Exemplary instructions including computing a dot product of signed words and accumulating in a double word with saturation; computing a dot product of bytes and accumulating in to a dword with saturation, where the input bytes can be signed or unsigned and the dword accumulation has output saturation; etc.
-
公开(公告)号:US20210132943A1
公开(公告)日:2021-05-06
申请号:US16486960
申请日:2017-07-01
Applicant: Intel Corporation
Inventor: Robert VALENTINE , Dan BAUM , Zeev SPERBER , Jesus CORBAL , Elmoustapha OULD-AHMED-VALL , Bret L. TOLL , Mark J. CHARNEY , Menachem ADELMAN , Barukh ZIV , Alexander HEINECKE , Simon RUBANOVICH
IPC: G06F9/30
Abstract: Embodiments detailed herein relate to matrix operations. For example, embodiments of instruction support for matrix (tile) dot product operations are detailed. Exemplary instructions including computing a dot product of signed words and accumulating in a double word with saturation; computing a dot product of bytes and accumulating in to a dword with saturation, where the input bytes can be signed or unsigned and the dword accumulation has output saturation; etc.
-
-
-
-
-
-
-
-
-