Patent search ap:("INTEL CORPORATION") AND inv:"Elmoustapha OULD-AHMED-VALL" Page 9

81.

发明公开
SYSTEMS, METHODS, AND APPARATUSES FOR TILE LOAD 审中-公开

公开(公告)号：US20230236833A1

公开(公告)日：2023-07-27

申请号：US18100194

申请日：2023-01-23

Applicant: Intel Corporation

Inventor： Robert VALENTINE , Menachem ADELMAN , Milind B. GIRKAR , Zeev SPERBER , Mark J. CHARNEY , Bret L. TOLL , Rinat RAPPOPORT , Jesus Corbal , Stanislav SHWARTSMAN , Dan BAUM , Igor YANOVER , Alexander F. HEINECKE , Barukh ZIV , Elmoustapha OULD-AHMED-VALL , Yuri GEBIL

IPC: G06F9/30 , G06F7/485 , G06F17/16 , G06F7/76 , G06F7/487 , G06F9/38

CPC classification number: G06F9/30036 , G06F7/485 , G06F17/16 , G06F9/30112 , G06F7/762 , G06F9/3016 , G06F9/30196 , G06F9/30043 , G06F9/30109 , G06F9/30185 , G06F7/4876 , G06F9/30145 , G06F9/30134 , G06F9/30149 , G06F9/3836 , G06F9/30032 , G06F9/3001 , G06F9/3818 , G06F2212/454

Abstract: Embodiments detailed herein relate to matrix operations. In particular, the loading of a matrix (tile) from memory. For example, support for a loading instruction is described in the form of decode circuitry to decode an instruction having fields for an opcode, a destination matrix operand identifier, and source memory information, and execution circuitry to execute the decoded instruction to load groups of strided data elements from memory into configured rows of the identified destination matrix operand to memory.

82.

发明申请
SYSTEMS AND METHODS TO SKIP INCONSEQUENTIAL MATRIX OPERATIONS 有权

公开(公告)号：US20230070579A1

公开(公告)日：2023-03-09

申请号：US17878427

申请日：2022-08-01

Applicant: Intel Corporation

Inventor： Elmoustapha OULD-AHMED-VALL , William RASH , Subramaniam MAIYURAN , Varghese GEORGE , Rajesh SANKARAN

IPC: G06F9/30

Abstract: Disclosed embodiments relate to systems and methods to skip inconsequential matrix operations. In one example, a processor includes decode circuitry to decode an instruction having fields to specify an opcode and locations of first source, second source, and destination matrices, the opcode indicating that the processor is to multiply each element at row M and column K of the first source matrix with a corresponding element at row K and column N of the second source matrix, and accumulate a resulting product with previous contents of a corresponding element at row M and column N of the destination matrix, the processor to skip multiplications that, based on detected values of corresponding multiplicands, would generate inconsequential results; scheduling circuitry to schedule execution of the instruction; and execution circuitry to execute the instructions as per the opcode.

83.

发明申请
SYSTEMS, METHODS, AND APPARATUSES FOR MATRIX ADD, SUBTRACT, AND MULTIPLY 有权

公开(公告)号：US20220171623A1

公开(公告)日：2022-06-02

申请号：US17548214

申请日：2021-12-10

Applicant: Intel Corporation

Inventor： Robert VALENTINE , Dan BAUM , Zeev SPERBER , Jesus CORBAL , Elmoustapha OULD-AHMED-VALL , Bret L. TOLL , Mark J. CHARNEY , Barukh ZIV , Alexander HEINECKE , Milind GIRKAR , Simon RUBANOVICH

IPC: G06F9/30 , G06F7/485 , G06F7/487 , G06F17/16 , G06F7/76 , G06F9/38

Abstract: Embodiments detailed herein relate to matrix operations. In particular, support for matrix (tile) addition, subtraction, and multiplication is described. For example, circuitry to support instructions for element-by-element matrix (tile) addition, subtraction, and multiplication are detailed. In some embodiments, for matrix (tile) addition, decode circuitry is to decode an instruction having fields for an opcode, a first source matrix operand identifier, a second source matrix operand identifier, and a destination matrix operand identifier; and execution circuitry is to execute the decoded instruction to, for each data element position of the identified first source matrix operand: add a first data value at that data element position to a second data value at a corresponding data element position of the identified second source matrix operand, and store a result of the addition into a corresponding data element position of the identified destination matrix operand.

84.

发明申请
SYSTEMS AND METHODS FOR PERFORMING INSTRUCTIONS TO TRANSFORM MATRICES INTO ROW-INTERLEAVED FORMAT 有权

公开(公告)号：US20210216323A1

公开(公告)日：2021-07-15

申请号：US17216635

申请日：2021-03-29

Applicant: Intel Corporation

Inventor： Raanan SADE , Robert VALENTINE , Bret TOLL , Christopher J. HUGHES , Alexander F. HEINECKE , Elmoustapha OULD-AHMED-VALL , Mark J. CHARNEY

IPC: G06F9/30

Abstract: Disclosed embodiments relate to systems and methods for performing instructions to transform matrices into a row-interleaved format. In one example, a processor includes fetch and decode circuitry to fetch and decode an instruction having fields to specify an opcode and locations of source and destination matrices, wherein the opcode indicates that the processor is to transform the specified source matrix into the specified destination matrix having the row-interleaved format; and execution circuitry to respond to the decoded instruction by transforming the specified source matrix into the specified RowInt-formatted destination matrix by interleaving J elements of each J-element sub-column of the specified source matrix in either row-major or column-major order into a K-wide submatrix of the specified destination matrix, the K-wide submatrix having K columns and enough rows to hold the J elements.

85.

发明申请
SYSTEMS, METHODS, AND APPARATUSES FOR TILE STORE 审中-公开

公开(公告)号：US20200233666A1

公开(公告)日：2020-07-23

申请号：US16487755

申请日：2017-07-01

Applicant: Intel Corporation

Inventor： Robert VALENTINE , Menachem ADELMAN , Elmoustapha OULD-AHMED-VALL , Bret L. TOLL , Milind B. GIRKAR , Zeev SPERBER , Mark J. CHARNEY , Rinat RAPPOPORT , Jesus CORBAL , Stanislav SHWARTSMAN , Igor YANOVER , Alexander F. HEINECKE , Barukh ZIV , Dan BAUM , Yuri GEBIL

IPC: G06F9/30 , G06F17/16

Abstract: Embodiments detailed herein relate to matrix operations. In particular, the loading of a matrix (tile) from memory. For example, support for a loading instruction is described in at least a form of decode circuitry to decode an instruction having fields for an opcode, a source matrix operand identifier, and destination memory information, and execution circuitry to execute the decoded instruction to store each data element of configured rows of the identified source matrix operand to memory based on the destination memory information

86.

发明申请
SYSTEMS AND METHODS TO ACCELERATE MULTIPLICATION OF SPARSE MATRICES 审中-公开

公开(公告)号：US20200210517A1

公开(公告)日：2020-07-02

申请号：US16234374

申请日：2018-12-27

Applicant: Intel Corporation

Inventor： Dan BAUM , Chen KOREN , Elmoustapha OULD-AHMED-VALL , Michael ESPIG , Christopher J. HUGHES , Raanan SADE , Robert VALENTINE , Mark J. CHARNEY , Alexander F. HEINECKE

IPC: G06F17/16 , G06F9/38 , G06F9/30

Abstract: Disclosed embodiments relate to accelerating multiplication of sparse matrices. In one example, a processor is to fetch and decode an instruction having fields to specify locations of first, second, and third matrices, and an opcode indicating the processor is to multiply and accumulate matching non-zero (NZ) elements of the first and second matrices with corresponding elements of the third matrix, and executing the decoded instruction as per the opcode to generate NZ bitmasks for the first and second matrices, broadcast up to two NZ elements at a time from each row of the first matrix and each column of the second matrix to a processing engine (PE) grid, each PE to multiply and accumulate matching NZ elements of the first and second matrices with corresponding elements of the third matrix. Each PE further to store an NZ element for use in a subsequent multiplications.

87.

发明申请
DATA ELEMENT COMPARISON PROCESSORS, METHODS, SYSTEMS, AND INSTRUCTIONS 审中-公开

公开(公告)号：US20200089494A1

公开(公告)日：2020-03-19

申请号：US16579394

申请日：2019-09-23

Applicant: Intel Corporation

Inventor： Asit K. MISHRA , Edward T. GROCHOWSKI , Jonathan D. PEARCE , Deborah T. MARR , Ehud COHEN , Elmoustapha OULD-AHMED-VALL , Jesus Corbal SAN ADRIAN , Robert VALENTINE , Mark J. CHARNEY , Christopher J. HUGHES , Milind B. GIRKAR

IPC: G06F9/30

Abstract: A processor includes a decode unit to decode an instruction that is to indicate a first source packed data operand that is to include at least four data elements, to indicate a second source packed data operand that is to include at least four data elements, and to indicate one or more destination storage locations. The execution unit, in response to the instruction, is to store at least one result mask operand in the destination storage location(s). The at least one result mask operand is to include a different mask element for each corresponding data element in one of the first and second source packed data operands in a same relative position. Each mask element is to indicate whether the corresponding data element in said one of the source packed data operands equals any of the data elements in the other of the source packed data operands.

88.

发明申请
SYSTEMS, APPARATUSES, AND METHODS FOR VECTOR-PACKED FRACTIONAL MULTIPLICATION OF SIGNED WORDS WITH ROUNDING, SATURATION, AND HIGH-RESULT SELECTION 审中-公开

公开(公告)号：US20200073635A1

公开(公告)日：2020-03-05

申请号：US16613529

申请日：2017-06-29

Applicant: Intel Corporation

Inventor： Venkateswara R. MADDURI , Elmoustapha OULD-AHMED-VALL , Robert VALENTINE , Jesus CORBAL , Mark J. CHARNEY , Carl MURRAY , Milind GIRKAR , Bret TOLL

IPC: G06F7/499

Abstract: Embodiments of systems, apparatuses, and methods for vector-packed fractional multiplication of signed words with rounding, saturation, and high-result selection in a processor are described. For example, execution circuitry executes a decoded instruction to perform a fractional multiplication operation for each of a plurality of pairs of packed data elements to yield a plurality of output values, round each of the plurality of output values, detect whether any of the plurality of output values reflect an overflow or underflow, for any of the plurality of output values that reflect an overflow or underflow, saturate the output value, and store the plurality of output values into a corresponding plurality of positions of the packed data destination operand.

89.

发明申请
SYSTEMS, APPARATUSES, AND METHODS FOR FUSED MULTIPLY ADD 审中-公开

公开(公告)号：US20200026515A1

公开(公告)日：2020-01-23

申请号：US16338324

申请日：2016-10-20

Applicant: Intel Corporation

Inventor： Robert Valentine , Galina RYVCHIN , Piotr MAJCHER , Mark J. CHARNEY , Elmoustapha OULD-AHMED-VALL , Jesus CORBAL , Milind B. GIRKAR , Zeev SPERBER , Simon RUBANOVICH , Amit GRADSTEIN

IPC: G06F9/30 , G06F9/38 , G06F7/544

Abstract: In some embodiments, packed data elements of first and second packed data source operands are of a first, different size than a second size of packed data elements of a third packed data operand. Execution circuitry executes decoded single instruction to perform, for each packed data element position of a destination operand, a multiplication of a M N-sized packed data elements from the first and second packed data sources that correspond to a packed data element position of the third packed data source, add of results from these multiplications to a full-sized packed data element of a packed data element position of the third packed data source, and storage of the addition result in a packed data element position destination corresponding to the packed data element position of the third packed data source, wherein M is equal to the full-sized packed data element divided by N.

90.

发明申请
SYSTEMS, METHODS, AND APPARATUSES FOR VECTOR BROADCAST 审中-公开

公开(公告)号：US20190205131A1

公开(公告)日：2019-07-04

申请号：US15858278

申请日：2017-12-29

Applicant: Intel Corporation

Inventor： Maciej URBANSKI , Elmoustapha OULD-AHMED-VALL

IPC: G06F9/30

CPC classification number: G06F9/3016 , G06F9/3001 , G06F9/30036

Abstract: Systems, methods, and apparatuses for broadcasting a selected data element and performing an operation in response to a single instruction are described. For example, a processor comprising decode circuitry to decode an instruction having fields for an opcode, at least two packed data source operand identifiers, a packed data destination operand identifier, and an immediate, and execution circuitry to execute the decoded instruction to: broadcast a packed data element from the identified first packed data source operand, wherein the packed data element position to be broadcast is selected based on a value of the immediate, perform operations according to the opcode on the broadcasted packed data element from the identified first packed data source operand and packed data elements of the identified second packed data source operand is described.

Search Results

Country/Region

Patent validity

Application date

Publication (announcement) day

applicant

The country/region where the applicant is located

Inventor

IPC

IPC Department

IPC class

IPC subclass

IPC group

IPC team

Appearance classification