-
公开(公告)号:US20230236833A1
公开(公告)日:2023-07-27
申请号:US18100194
申请日:2023-01-23
Applicant: Intel Corporation
Inventor: Robert VALENTINE , Menachem ADELMAN , Milind B. GIRKAR , Zeev SPERBER , Mark J. CHARNEY , Bret L. TOLL , Rinat RAPPOPORT , Jesus Corbal , Stanislav SHWARTSMAN , Dan BAUM , Igor YANOVER , Alexander F. HEINECKE , Barukh ZIV , Elmoustapha OULD-AHMED-VALL , Yuri GEBIL
CPC classification number: G06F9/30036 , G06F7/485 , G06F17/16 , G06F9/30112 , G06F7/762 , G06F9/3016 , G06F9/30196 , G06F9/30043 , G06F9/30109 , G06F9/30185 , G06F7/4876 , G06F9/30145 , G06F9/30134 , G06F9/30149 , G06F9/3836 , G06F9/30032 , G06F9/3001 , G06F9/3818 , G06F2212/454
Abstract: Embodiments detailed herein relate to matrix operations. In particular, the loading of a matrix (tile) from memory. For example, support for a loading instruction is described in the form of decode circuitry to decode an instruction having fields for an opcode, a destination matrix operand identifier, and source memory information, and execution circuitry to execute the decoded instruction to load groups of strided data elements from memory into configured rows of the identified destination matrix operand to memory.
-
公开(公告)号:US20230070579A1
公开(公告)日:2023-03-09
申请号:US17878427
申请日:2022-08-01
Applicant: Intel Corporation
Inventor: Elmoustapha OULD-AHMED-VALL , William RASH , Subramaniam MAIYURAN , Varghese GEORGE , Rajesh SANKARAN
IPC: G06F9/30
Abstract: Disclosed embodiments relate to systems and methods to skip inconsequential matrix operations. In one example, a processor includes decode circuitry to decode an instruction having fields to specify an opcode and locations of first source, second source, and destination matrices, the opcode indicating that the processor is to multiply each element at row M and column K of the first source matrix with a corresponding element at row K and column N of the second source matrix, and accumulate a resulting product with previous contents of a corresponding element at row M and column N of the destination matrix, the processor to skip multiplications that, based on detected values of corresponding multiplicands, would generate inconsequential results; scheduling circuitry to schedule execution of the instruction; and execution circuitry to execute the instructions as per the opcode.
-
公开(公告)号:US20220171623A1
公开(公告)日:2022-06-02
申请号:US17548214
申请日:2021-12-10
Applicant: Intel Corporation
Inventor: Robert VALENTINE , Dan BAUM , Zeev SPERBER , Jesus CORBAL , Elmoustapha OULD-AHMED-VALL , Bret L. TOLL , Mark J. CHARNEY , Barukh ZIV , Alexander HEINECKE , Milind GIRKAR , Simon RUBANOVICH
Abstract: Embodiments detailed herein relate to matrix operations. In particular, support for matrix (tile) addition, subtraction, and multiplication is described. For example, circuitry to support instructions for element-by-element matrix (tile) addition, subtraction, and multiplication are detailed. In some embodiments, for matrix (tile) addition, decode circuitry is to decode an instruction having fields for an opcode, a first source matrix operand identifier, a second source matrix operand identifier, and a destination matrix operand identifier; and execution circuitry is to execute the decoded instruction to, for each data element position of the identified first source matrix operand: add a first data value at that data element position to a second data value at a corresponding data element position of the identified second source matrix operand, and store a result of the addition into a corresponding data element position of the identified destination matrix operand.
-
84.
公开(公告)号:US20210216323A1
公开(公告)日:2021-07-15
申请号:US17216635
申请日:2021-03-29
Applicant: Intel Corporation
Inventor: Raanan SADE , Robert VALENTINE , Bret TOLL , Christopher J. HUGHES , Alexander F. HEINECKE , Elmoustapha OULD-AHMED-VALL , Mark J. CHARNEY
IPC: G06F9/30
Abstract: Disclosed embodiments relate to systems and methods for performing instructions to transform matrices into a row-interleaved format. In one example, a processor includes fetch and decode circuitry to fetch and decode an instruction having fields to specify an opcode and locations of source and destination matrices, wherein the opcode indicates that the processor is to transform the specified source matrix into the specified destination matrix having the row-interleaved format; and execution circuitry to respond to the decoded instruction by transforming the specified source matrix into the specified RowInt-formatted destination matrix by interleaving J elements of each J-element sub-column of the specified source matrix in either row-major or column-major order into a K-wide submatrix of the specified destination matrix, the K-wide submatrix having K columns and enough rows to hold the J elements.
-
公开(公告)号:US20200233666A1
公开(公告)日:2020-07-23
申请号:US16487755
申请日:2017-07-01
Applicant: Intel Corporation
Inventor: Robert VALENTINE , Menachem ADELMAN , Elmoustapha OULD-AHMED-VALL , Bret L. TOLL , Milind B. GIRKAR , Zeev SPERBER , Mark J. CHARNEY , Rinat RAPPOPORT , Jesus CORBAL , Stanislav SHWARTSMAN , Igor YANOVER , Alexander F. HEINECKE , Barukh ZIV , Dan BAUM , Yuri GEBIL
Abstract: Embodiments detailed herein relate to matrix operations. In particular, the loading of a matrix (tile) from memory. For example, support for a loading instruction is described in at least a form of decode circuitry to decode an instruction having fields for an opcode, a source matrix operand identifier, and destination memory information, and execution circuitry to execute the decoded instruction to store each data element of configured rows of the identified source matrix operand to memory based on the destination memory information
-
公开(公告)号:US20200210517A1
公开(公告)日:2020-07-02
申请号:US16234374
申请日:2018-12-27
Applicant: Intel Corporation
Inventor: Dan BAUM , Chen KOREN , Elmoustapha OULD-AHMED-VALL , Michael ESPIG , Christopher J. HUGHES , Raanan SADE , Robert VALENTINE , Mark J. CHARNEY , Alexander F. HEINECKE
Abstract: Disclosed embodiments relate to accelerating multiplication of sparse matrices. In one example, a processor is to fetch and decode an instruction having fields to specify locations of first, second, and third matrices, and an opcode indicating the processor is to multiply and accumulate matching non-zero (NZ) elements of the first and second matrices with corresponding elements of the third matrix, and executing the decoded instruction as per the opcode to generate NZ bitmasks for the first and second matrices, broadcast up to two NZ elements at a time from each row of the first matrix and each column of the second matrix to a processing engine (PE) grid, each PE to multiply and accumulate matching NZ elements of the first and second matrices with corresponding elements of the third matrix. Each PE further to store an NZ element for use in a subsequent multiplications.
-
公开(公告)号:US20200089494A1
公开(公告)日:2020-03-19
申请号:US16579394
申请日:2019-09-23
Applicant: Intel Corporation
Inventor: Asit K. MISHRA , Edward T. GROCHOWSKI , Jonathan D. PEARCE , Deborah T. MARR , Ehud COHEN , Elmoustapha OULD-AHMED-VALL , Jesus Corbal SAN ADRIAN , Robert VALENTINE , Mark J. CHARNEY , Christopher J. HUGHES , Milind B. GIRKAR
IPC: G06F9/30
Abstract: A processor includes a decode unit to decode an instruction that is to indicate a first source packed data operand that is to include at least four data elements, to indicate a second source packed data operand that is to include at least four data elements, and to indicate one or more destination storage locations. The execution unit, in response to the instruction, is to store at least one result mask operand in the destination storage location(s). The at least one result mask operand is to include a different mask element for each corresponding data element in one of the first and second source packed data operands in a same relative position. Each mask element is to indicate whether the corresponding data element in said one of the source packed data operands equals any of the data elements in the other of the source packed data operands.
-
公开(公告)号:US20200073635A1
公开(公告)日:2020-03-05
申请号:US16613529
申请日:2017-06-29
Applicant: Intel Corporation
Inventor: Venkateswara R. MADDURI , Elmoustapha OULD-AHMED-VALL , Robert VALENTINE , Jesus CORBAL , Mark J. CHARNEY , Carl MURRAY , Milind GIRKAR , Bret TOLL
IPC: G06F7/499
Abstract: Embodiments of systems, apparatuses, and methods for vector-packed fractional multiplication of signed words with rounding, saturation, and high-result selection in a processor are described. For example, execution circuitry executes a decoded instruction to perform a fractional multiplication operation for each of a plurality of pairs of packed data elements to yield a plurality of output values, round each of the plurality of output values, detect whether any of the plurality of output values reflect an overflow or underflow, for any of the plurality of output values that reflect an overflow or underflow, saturate the output value, and store the plurality of output values into a corresponding plurality of positions of the packed data destination operand.
-
公开(公告)号:US20200026515A1
公开(公告)日:2020-01-23
申请号:US16338324
申请日:2016-10-20
Applicant: Intel Corporation
Inventor: Robert Valentine , Galina RYVCHIN , Piotr MAJCHER , Mark J. CHARNEY , Elmoustapha OULD-AHMED-VALL , Jesus CORBAL , Milind B. GIRKAR , Zeev SPERBER , Simon RUBANOVICH , Amit GRADSTEIN
Abstract: In some embodiments, packed data elements of first and second packed data source operands are of a first, different size than a second size of packed data elements of a third packed data operand. Execution circuitry executes decoded single instruction to perform, for each packed data element position of a destination operand, a multiplication of a M N-sized packed data elements from the first and second packed data sources that correspond to a packed data element position of the third packed data source, add of results from these multiplications to a full-sized packed data element of a packed data element position of the third packed data source, and storage of the addition result in a packed data element position destination corresponding to the packed data element position of the third packed data source, wherein M is equal to the full-sized packed data element divided by N.
-
公开(公告)号:US20190205131A1
公开(公告)日:2019-07-04
申请号:US15858278
申请日:2017-12-29
Applicant: Intel Corporation
Inventor: Maciej URBANSKI , Elmoustapha OULD-AHMED-VALL
IPC: G06F9/30
CPC classification number: G06F9/3016 , G06F9/3001 , G06F9/30036
Abstract: Systems, methods, and apparatuses for broadcasting a selected data element and performing an operation in response to a single instruction are described. For example, a processor comprising decode circuitry to decode an instruction having fields for an opcode, at least two packed data source operand identifiers, a packed data destination operand identifier, and an immediate, and execution circuitry to execute the decoded instruction to: broadcast a packed data element from the identified first packed data source operand, wherein the packed data element position to be broadcast is selected based on a value of the immediate, perform operations according to the opcode on the broadcasted packed data element from the identified first packed data source operand and packed data elements of the identified second packed data source operand is described.
-
-
-
-
-
-
-
-
-