Patent search ap:("INTEL CORPORATION") AND inv:"Elmoustapha OULD-AHMED-VALL" Page 3

21.

发明公开
SYSTEMS AND METHODS FOR PERFORMING INSTRUCTIONS TO TRANSFORM MATRICES INTO ROW-INTERLEAVED FORMAT 审中-公开

公开(公告)号：US20230350682A1

公开(公告)日：2023-11-02

申请号：US18309469

申请日：2023-04-28

Applicant: Intel Corporation

Inventor： Raanan SADE , Robert VALENTINE , Bret TOLL , Christopher J. HUGHES , Alexander F. HEINECKE , Elmoustapha OULD-AHMED-VALL , Mark J. CHARNEY

IPC: G06F9/30

CPC classification number: G06F9/30167 , G06F9/30149 , G06F9/30101

Abstract: Disclosed embodiments relate to systems and methods for performing instructions to transform matrices into a row-interleaved format. In one example, a processor includes fetch and decode circuitry to fetch and decode an instruction having fields to specify an opcode and locations of source and destination matrices, wherein the opcode indicates that the processor is to transform the specified source matrix into the specified destination matrix having the row-interleaved format; and execution circuitry to respond to the decoded instruction by transforming the specified source matrix into the specified RowInt-formatted destination matrix by interleaving J elements of each J-element sub-column of the specified source matrix in either row-major or column-major order into a K-wide submatrix of the specified destination matrix, the K-wide submatrix having K columns and enough rows to hold the J elements.

22.

发明申请
SYSTEMS FOR PERFORMING INSTRUCTIONS TO QUICKLY CONVERT AND USE TILES AS 1D VECTORS 有权

公开(公告)号：US20220100515A1

公开(公告)日：2022-03-31

申请号：US17549221

申请日：2021-12-13

Applicant: Intel Corporation

Inventor： Bret TOLL , Christopher J. HUGHES , Dan BAUM , Elmoustapha OULD-AHMED-VALL , Raanan SADE , Robert VALENTINE , Mark J. CHARNEY , Alexander F. HEINECKE

IPC: G06F9/30

Abstract: Disclosed embodiments relate to systems for performing instructions to quickly convert and use matrices (tiles) as one-dimensional vectors. In one example, a processor includes fetch circuitry to fetch an instruction having fields to specify an opcode, locations of a two-dimensional (2D) matrix and a one-dimensional (1D) vector, and a group of elements comprising one of a row, part of a row, multiple rows, a column, part of a column, multiple columns, and a rectangular sub-tile of the specified 2D matrix, and wherein the opcode is to indicate a move of the specified group between the 2D matrix and the 1D vector, decode circuitry to decode the fetched instruction; and execution circuitry, responsive to the decoded instruction, when the opcode specifies a move from 1D, to move contents of the specified 1D vector to the specified group of elements.

23.

发明申请
SYSTEMS FOR PERFORMING INSTRUCTIONS TO QUICKLY CONVERT AND USE TILES AS 1D VECTORS 有权

公开(公告)号：US20220100505A1

公开(公告)日：2022-03-31

申请号：US17549363

申请日：2021-12-13

Applicant: Intel Corporation

Inventor： Bret TOLL , Christopher J. HUGHES , Dan BAUM , Elmoustapha OULD-AHMED-VALL , Raanan SADE , Robert VALENTINE , Mark J. CHARNEY , Alexander F. HEINECKE

IPC: G06F9/30 , G06F17/16 , G06F9/38

Abstract: Disclosed embodiments relate to systems for performing instructions to quickly convert and use matrices (tiles) as one-dimensional vectors. In one example, a processor includes fetch circuitry to fetch an instruction having fields to specify an opcode, locations of a two-dimensional (2D) matrix and a one-dimensional (1D) vector, and a group of elements comprising one of a row, part of a row, multiple rows, a column, part of a column, multiple columns, and a rectangular sub-tile of the specified 2D matrix, and wherein the opcode is to indicate a move of the specified group between the 2D matrix and the 1D vector, decode circuitry to decode the fetched instruction; and execution circuitry, responsive to the decoded instruction, when the opcode specifies a move from 1D, to move contents of the specified 1D vector to the specified group of elements.

24.

发明申请
APPARATUS AND METHOD FOR MULTIPLY, ADD/SUBTRACT, AND ACCUMULATE OF PACKED DATA ELEMENTS 有权

公开(公告)号：US20210357215A1

公开(公告)日：2021-11-18

申请号：US17380930

申请日：2021-07-20

Applicant: INTEL CORPORATION

Inventor： Venkateswara MADDURI , Elmoustapha OULD-AHMED-VALL , Mark CHARNEY , Robert VALENTINE , Jesus CORBAL

IPC: G06F9/30

Abstract: An apparatus and method for performing dual concurrent multiplications, subtraction/addition, and accumulation of packed data elements. For example one embodiment of a processor comprises: a decoder to decode an instruction to generate a decoded instruction; a first source register to store first and second packed data elements; a second source register to store third and fourth packed data elements; execution circuitry to execute the decoded instruction, the execution circuitry comprising: multiplier circuitry to multiply the first and third packed data elements to generate a first temporary product and to concurrently multiply the second and fourth packed data elements to generate a second temporary product, the first through fourth packed data elements all being a first width; circuitry to negate the first temporary product to generate a negated first product; adder circuitry to add the first negated product to a first accumulated packed data element from a third source register to generate a first result, the first result being a second width which is at least twice as large as the first width; the adder circuitry to concurrently add the second temporary product to a second accumulated packed data element to generate a second result of the second width; the first and second results to be stored in specified first and second data element positions within a destination register.

25.

发明申请
SYSTEMS, METHODS, AND APPARATUSES FOR TILE MATRIX MULTIPLICATION AND ACCUMULATION 有权

公开(公告)号：US20210349720A1

公开(公告)日：2021-11-11

申请号：US17382917

申请日：2021-07-22

Applicant: Intel Corporation

Inventor： Robert VALENTINE , Zeev SPERBER , Mark J. CHARNEY , Bret L. TOLL , Rinat RAPPOPORT , Stanislav SHWARTSMAN , Dan BAUM , Igor YANOVER , Elmoustapha OULD-AHMED-VALL , Menachem ADELMAN , Jesus CORBAL , Yuri GEBIL , Simon RUBANOVICH

IPC: G06F9/30 , G06F7/485 , G06F7/487 , G06F17/16 , G06F7/76 , G06F9/38

Abstract: Embodiments detailed herein relate to matrix operations. In particular, matrix (tile) multiply accumulate and negated matrix (tile) multiply accumulate are discussed. For example, in some embodiments decode circuitry to decode an instruction having fields for an opcode, an identifier for a first source matrix operand, an identifier of a second source matrix operand, and an identifier for a source/destination matrix operand; and execution circuitry to execute the decoded instruction to multiply the identified first source matrix operand by the identified second source matrix operand, add a result of the multiplication to the identified source/destination matrix operand, and store a result of the addition in the identified source/destination matrix operand and zero unconfigured columns of identified source/destination matrix operand are detailed.

26.

发明申请
SYSTEMS, APPARATUSES, AND METHODS FOR DUAL COMPLEX MULTIPLY ADD OF SIGNED WORDS 有权

公开(公告)号：US20210157580A1

公开(公告)日：2021-05-27

申请号：US16614118

申请日：2017-06-30

Applicant: Intel Corporation

Inventor： Elmoustapha OULD-AHMED-VALL , Venkateswara R. MADDURI , Mark J. CHARNEY , Robert VALENTINE

IPC: G06F9/30 , G06F7/544

Abstract: Embodiments of systems, apparatuses, and methods for dual complex number multiplication and addition in a processor are described. For example, execution circuitry executes a decoded instruction to multiplex data values from positions in source operands to a multiplier, the source operands including pairs complex numbers, calculate a real part of a product of each pair of complex numbers, add the real part of the product of a first pair of complex numbers to the real part of the product of a second pair of complex numbers to calculate a first real result, and add the real part of the product of a third pair of complex numbers to the real part of the product of a fourth pair of complex numbers to calculate a second real result, and store the results to corresponding positions in the destination operand.

27.

发明申请
SYSTEMS, METHODS, AND APPARATUSES FOR ZEROING A MATRIX 审中-公开

公开(公告)号：US20200241873A1

公开(公告)日：2020-07-30

申请号：US16487784

申请日：2017-07-01

Applicant: Intel Corporation

Inventor： Robert VALENTINE , Menachem ADELMAN , Zeev SPERBER , Mark J. CHARNEY , Bret L. TOLL , Jesus CORBAL , Alexander F. HEINECKE , Barukh ZIV , Elmoustapha OULD-AHMED-VALL , Stanislav SHWARTSMAN

IPC: G06F9/30 , G06F17/16

Abstract: Embodiments detailed herein relate to matrix operations. In particular, performing a matrix operation of zeroing a matrix in response to a single instruction. For example, a processor detailed which includes decode circuitry to decode an instruction having fields for an opcode and a source/destination matrix operand identifier; and execution circuitry to execute the decoded instruction to zero each data element of the identified source/destination matrix.

28.

发明申请
SYSTEMS, METHODS, AND APPARATUSES FOR TILE MATRIX MULTIPLICATION AND ACCUMULATION 审中-公开

公开(公告)号：US20200233667A1

公开(公告)日：2020-07-23

申请号：US16487787

申请日：2017-07-01

Applicant: Intel Corporation

Inventor： Robert VALENTINE , Zeev SPERBER , Mark J. CHARNEY , Bret L. TOLL , Rinat RAPPOPORT , Stanislav SHWARTSMAN , Dan BAUM , Igor YANOVER , Elmoustapha OULD-AHMED-VALL , Menachem ADELMAN , Jesus CORBAL , Yuri GEBIL , Simon RUBANOVICH

IPC: G06F9/30 , G06F17/16

Abstract: Embodiments detailed herein relate to matrix operations. In particular, matrix (tile) multiply accumulate and negated matrix (tile) multiply accumulate are discussed. For example, in some embodiments decode circuitry to decode an instruction having fields for an opcode, an identifier for a first source matrix operand, an identifier of a second source matrix operand, and an identifier for a source/destination matrix operand; and execution circuitry to execute the decoded instruction to multiply the identified first source matrix operand by the identified second source matrix operand, add a result of the multiplication to the identified source/destination matrix operand, and store a result of the addition in the identified source/destination matrix operand and zero unconfigured columns of identified source/destination matrix operand are detailed.

29.

发明申请
SYSTEMS, APPARATUSES, AND METHODS FOR CONTROLLABLE SINE AND/OR COSINE OPERATIONS 审中-公开

公开(公告)号：US20200073658A1

公开(公告)日：2020-03-05

申请号：US16613537

申请日：2017-06-30

Applicant: Intel Corporation

Inventor： Venkateswara R. MADDURI , Elmoustapha OULD-AHMED-VALL , Robert VALENTINE , Jesus CORBAL , Mark J. CHARNEY , Carl MURRAY , Milind GIRKAR , Bret TOLL

IPC: G06F9/30 , G06F7/548

Abstract: Embodiments of systems, apparatuses, and methods for performing controllable sine and/or cosine operations in a processor are described. For example, execution circuitry executes a decoded instruction to compute at least a real output value and an imaginary output value based on at least a cosine calculation and a sine calculation, the cosine and sine calculations each based on an index value from a packed data source operand, add the index value with an index increment value from the packed data source operand to create an updated index value, and store the real output value, the imaginary output value, and the updated index value to a packed data destination operand.

30.

发明申请
SYSTEMS, APPARATUSES, AND METHODS FOR GENERATING AN INDEX BY SORT ORDER AND REORDERING ELEMENTS BASED ON SORT ORDER 审中-公开

公开(公告)号：US20200050452A1

公开(公告)日：2020-02-13

申请号：US16367186

申请日：2019-03-27

Applicant: Intel Corporation

Inventor： Dan BAUM , Ronen ZOHAR , Asit MISHRA , Prasoonkumar Surti , Elmoustapha OULD-AHMED-VALL , Christopher HUGHES , Alexander HEINECKE

IPC: G06F9/30

Abstract: Disclosed embodiments relate to apparatuses, systems, and methods for performing sort indexing and/or permutation using an index. An exemplary apparatus includes decode circuitry to decode an instruction, the instruction to include a first field to identify a location of a source vector, a second field to identify a location of a destination vector, and an opcode to indicate to execution circuitry to execute the decoded instruction to sort values of the source vector and store a result of the sort in the destination vector by generating, per each element of the source vector, an index value using one or more comparisons of the element itself and to other data elements of the source vector, and permuting the values of the elements of the source vector based upon the index values for the elements and execution circuitry to execute the decoded instruction as indicated by the opcode.

Search Results

Country/Region

Patent validity

Application date

Publication (announcement) day

applicant

The country/region where the applicant is located

Inventor

IPC

IPC Department

IPC class

IPC subclass

IPC group

IPC team

Appearance classification