Patent search ap:("Intel Corporation") AND inv:"Georganas Page Evangelos"

1.

发明公开
VECTOR PACKED MATRIX MULTIPLICATION AND ACCUMULATION PROCESSORS, METHODS, SYSTEMS, AND INSTRUCTIONS 审中-公开

公开(公告)号：EP4485178A1

公开(公告)日：2025-01-01

申请号：EP23213640.8

申请日：2023-12-01

Applicant: INTEL Corporation

Inventor： Heinecke, Alexander , Wong, Wing Shek , Robinson, Stephen , Sade, Raanan , Gradstein, Amit , Rubanovich, Simon , Espig, Michael , Baum, Dan , Georganas, Evangelos , Kalamkar, Dhiraj

IPC: G06F9/30

Abstract: Decoder circuitry to decode an instruction indicating a first vector register having a 128-bit lane to store a first matrix having two rows by K columns of data elements having a number of bits, a storage location having 128 bits to store a second matrix having K rows by two columns of data elements having the number of bits, and a second vector register having a 128-bit lane to store a third matrix having two rows by two columns of data elements having a greater number of bits. Execution circuitry is to perform operations for the instruction, including to generate and store a result matrix having two rows by two columns of result data elements having the greater number of bits in 128-bit lane of second vector register. The result matrix represents accumulation of the third matrix with product matrix generated from matrix multiplication using the first and second matrices.

2.

发明公开
APPARATUS AND METHOD FOR A LOAD INSTRUCTION WITH A READ-SHARED INDICATION 审中-公开

公开(公告)号：EP4485177A1

公开(公告)日：2025-01-01

申请号：EP23211673.1

申请日：2023-11-23

Applicant: Intel Corporation

Inventor： Hughes, Christopher J. , Wang, Zhe , Baum, Dan , Madduri, Venkateswara Rao , Heinecke, Alexander , Georganas, Evangelos , Dan, Chen , Nuzman, Joseph

IPC: G06F9/30 , G06F12/00

Abstract: Techniques for loading data with a hint related to data sharing with other cores. For example, one embodiment of an apparatus comprises: a plurality of cores to process instructions; a first core of the plurality of cores comprising: decoder circuitry to decode a single instruction, the single instruction having a first field for an opcode to indicate a load operation to read data from a memory, a second field to indicate a memory address for a location of the data in the memory, and a third field to store a value to indicate whether the data is expected to be shared between the first core and at least a second core of the plurality of cores; execution circuitry to execute the single instruction to read the data from the location in the memory; and cache controller circuitry to store the data in one or more caches in a state selected based on the value.

3.

发明公开
SYSTEMS AND METHODS TO TRANSPOSE VECTORS ON-THE-FLY WHILE LOADING FROM MEMORY 审中-实审

公开(公告)号：EP4375835A2

公开(公告)日：2024-05-29

申请号：EP24169357.1

申请日：2019-10-15

Applicant: Intel Corporation

Inventor： Heinecke, Alexander F. , Georganas, Evangelos , Hughes, Christopher J. , Sade, Raanan , Valentine, Robert

IPC: G06F9/38

CPC classification number: G06F9/30032 , G06F9/30036 , G06F9/30109 , G06F9/3875 , G06F9/30038

Abstract: Disclosed embodiments relate to transposing vectors while loading from memory. In one example, a processor comprises: a register file comprising one or more vector registers; a memory interface to read a plurality of data elements from a memory; fetch circuitry to fetch an instruction; decode circuitry to decode the instruction, and execution circuitry to execute the instruction. The instruction includes a plurality of fields to indicate an opcode, a subset of the plurality of data elements to be broadcast, and locations of the plurality of data elements, the plurality of data elements arranged in a corresponding plurality of relative positions, wherein the plurality of data elements include a first group of data elements and a second group of data elements. The execution circuitry performs a permute operation and a broadcast operation in accordance with the instruction, wherein the broadcast operation is to cause the subset of the plurality of data elements to be broadcast to a plurality of the relative positions associated with a corresponding plurality of other subsets of the plurality of data elements, the subset of the plurality of data elements to replace the other corresponding subsets at the plurality of relative positions.

4.

发明公开
APPARATUSES, METHODS, AND SYSTEMS FOR 8-BIT FLOATING-POINT MATRIX DOT PRODUCT INSTRUCTIONS 审中-公开

公开(公告)号：EP4276608A3

公开(公告)日：2024-01-10

申请号：EP23195872.9

申请日：2021-09-14

Applicant: Intel Corporation

Inventor： Mellempudi, Naveen , Heinecke, Alexander F. , Valentine, Robert , Charney, Mark J. , Hughes, Christopher J. , Georganas, Evangelos , Sperber, Zeev , Gradstein, Amit , Rubanovich, Simon

IPC: G06F9/30

Abstract: Systems, methods, and apparatuses relating to 8-bit floating-point matrix dot product instructions are described. For example, a processing unit comprises circuitry to perform operations corresponding to an instruction, the instruction to specify a first matrix having M rows by 4*K columns of 8-bit floating-point data elements, a second matrix having 4*K rows by N columns of 8-bit floating-point data elements, and a third matrix having M rows by N columns of 32-bit single precision floating-point data elements. The operations includes to, for each row m of the M rows of the first matrix, and for each column n of the N columns of the second matrix: convert 4*K 8-bit floating-point data elements of the row m of the first matrix to 4*K corresponding higher precision floating-point data elements having a higher precision than an 8-bit floating-point data element, and convert 4*K 8-bit floating-point data elements of the column n of the second matrix to 4*K corresponding higher precision floating-point data elements having a higher precision than the 8-bit floating-point data element; multiply the 4*K higher precision floating-point data elements corresponding to the row m of the first matrix with corresponding ones of the 4*K higher precision floating-point data elements corresponding to the column n of the second matrix to generate 4*K products; accumulate the 4*K products with a 32-bit single precision floating-point data element corresponding to a row m of the M rows, and a column n of the N columns, of the third matrix, to generate a result 32-bit single precision floating-point data element; and store the result 32-bit single precision floating-point data element at the row m and the column n of the third matrix.

5.

发明公开
TILE LOAD AND EXPAND INSTRUCTION 审中-公开

公开(公告)号：EP4293503A1

公开(公告)日：2023-12-20

申请号：EP23171339.7

申请日：2023-05-03

Applicant: Intel Corporation

Inventor： Adelman, Menachem , Gradstein, Amit , Rubanovich, Simon , Ziv, Barukh , Sherman, Uri , Rip, Dana , Mizrahi, Shahar , Baum, Dan , Rappoport, Rinat , Jain, Nilesh , Sperber, Zeev , Stupp, Gideon , Heinecke, Alexander , Hughes, Christopher , Georganas, Evangelos

IPC: G06F9/30

Abstract: Techniques and mechanisms for processor circuitry to execute a load and expand instruction of an instruction set to generate decompressed matrix data. In an embodiment, the instruction comprises a source operand which indicates a location from which compressed matrix data, and corresponding metadata, are to be accessed. A destination operand of the instruction indicates a location which is to receive decompressed metadata, which is generated, during execution of the instruction, based on the compressed matrix data and the corresponding metadata. The metadata comprises compression mask information which identifies which elements of the matrix have been masked from the compressed matrix data. In another embodiment, the instruction further comprises a count operand which identifies a total number of the unmasked matrix elements which are represented in the compressed matrix data.

6.

发明公开
APPARATUSES, METHODS, AND SYSTEMS FOR 8-BIT FLOATING-POINT MATRIX DOT PRODUCT INSTRUCTIONS 审中-公开

公开(公告)号：EP4276608A2

公开(公告)日：2023-11-15

申请号：EP23195872.9

申请日：2021-09-14

Applicant: Intel Corporation

Inventor： Mellempudi, Naveen , Heinecke, Alexander F. , Valentine, Robert , Charney, Mark J. , Hughes, Christopher J. , Georganas, Evangelos , Sperber, Zeev , Gradstein, Amit , Rubanovich, Simon

IPC: G06F9/30

Abstract: Systems, methods, and apparatuses relating to 8-bit floating-point matrix dot product instructions are described. For example, a processing unit comprises circuitry to perform operations corresponding to an instruction, the instruction to specify a first matrix having M rows by 4*K columns of 8-bit floating-point data elements, a second matrix having 4*K rows by N columns of 8-bit floating-point data elements, and a third matrix having M rows by N columns of 32-bit single precision floating-point data elements. The operations includes to, for each row m of the M rows of the first matrix, and for each column n of the N columns of the second matrix: convert 4*K 8-bit floating-point data elements of the row m of the first matrix to 4*K corresponding higher precision floating-point data elements having a higher precision than an 8-bit floating-point data element, and convert 4*K 8-bit floating-point data elements of the column n of the second matrix to 4*K corresponding higher precision floating-point data elements having a higher precision than the 8-bit floating-point data element; multiply the 4*K higher precision floating-point data elements corresponding to the row m of the first matrix with corresponding ones of the 4*K higher precision floating-point data elements corresponding to the column n of the second matrix to generate 4*K products; accumulate the 4*K products with a 32-bit single precision floating-point data element corresponding to a row m of the M rows, and a column n of the N columns, of the third matrix, to generate a result 32-bit single precision floating-point data element; and store the result 32-bit single precision floating-point data element at the row m and the column n of the third matrix.

7.

发明公开
MATRIX TRANSPOSE AND MULTIPLY 审中-公开

公开(公告)号：EP4468146A3

公开(公告)日：2025-02-19

申请号：EP24205150.6

申请日：2020-11-26

Applicant: INTEL Corporation

Inventor： Adelman, Menachem , Valentine, Robert , Ziv, Barukh , Gradstein, Amit , Rubanovich, Simon , Sperber, Zeev , Charney, Mark J. , Hughes, Christopher J. , Heinecke, Alexander F. , Georganas, Evangelos , Pham, Binh

IPC: G06F9/30

Abstract: Embodiments for a matrix transpose and multiply operation are disclosed. In an embodiment, a processor comprises: a plurality of registers to store a plurality of packed data elements including a first plurality of packed data elements of a first source matrix tile and a second plurality of packed data elements of a second source matrix tile, the first and second source matrix tiles comprising respective portions of a first source matrix and a second source matrix, and wherein each packed data element of the plurality of packed data elements has an element width; a decoder to decode one or more instructions, at least one instruction of the one or more instructions including an opcode field configured to specify an opcode, a first source operand configured to indicate the first source matrix tile, a second source operand configured to indicate the second source matrix tile, and a destination operand configured to indicate a result matrix tile; and execution circuitry to, in response to the one or more instructions, to transpose the first source matrix tile in accordance with a granularity equal to the element width to generate a first transposed source matrix tile and to multiply the first transposed source matrix tile and the second source matrix tile. The execution circuitry comprises: a plurality of multipliers to multiply data elements of the first transposed source matrix tile and corresponding data elements of the second source matrix tile to produce a corresponding plurality of products; and one or more accumulators to add groups of the products to generate corresponding result data elements in the result matrix tile.

8.

发明公开
MATRIX TRANSPOSE AND MULTIPLY 审中-公开

公开(公告)号：EP4462249A3

公开(公告)日：2025-02-19

申请号：EP24203555.8

申请日：2020-11-26

Applicant: INTEL Corporation

Inventor： Adelman, Menachem , Valentine, Robert , Ziv, Barukh , Gradstein, Amit , Rubanovich, Simon , Sperber, Zeev , Charney, Mark J. , Hughes, Christopher J. , Heinecke, Alexander F. , Georganas, Evangelos , Pham, Binh

IPC: G06F9/30

Abstract: Embodiments for a matrix transpose and multiply operation are disclosed. In an embodiment, an apparatus comprises decode circuitry to decode an instance of an instruction having a format including an opcode field to specify an opcode, a first destination operand field to specify a destination matrix location, a first source operand field to specify a first source matrix location, a second source operand field to specify a second source matrix location, and a third operand field to specify a source/destination matrix location; and execution circuitry to, in response to the opcode of the decoded instance of the instruction, transpose columns of data element pairs of the first source matrix into rows, perform a dot product of data element pairs of the transposed columns of data element pairs of the first source matrix and corresponding row data element pairs of the second source matrix, add a result of the dot product to a corresponding row data element of the source/destination matrix.

9.

发明公开
APPARATUS AND METHOD FOR DOWN-CONVERTING AND INTERLEAVING MULTIPLE FLOATING POINT VALUES 审中-实审

公开(公告)号：EP4321992A3

公开(公告)日：2024-05-01

申请号：EP23210931.4

申请日：2020-02-07

Applicant: Intel Corporation

Inventor： Adelman, Menachem , Valentine, Robert , Ziv, Barukh , Gradstein, Amit , Rubanovich, Simon , Heinecke, Alexander , Georganas, Evangelos

IPC: G06F9/30

CPC classification number: G06F7/483 , G06F2207/382820130101 , G06F9/30032 , G06F9/30036 , G06F9/30025

Abstract: An apparatus and method for down-converting and interleaving data elements. For example, one embodiment of a processor comprises: a decoder to decode a first instruction to generate a decoded instruction; a first source register to store a first plurality of packed data elements; a second source register to store a second plurality of packed data elements; a destination register to store a third plurality and a fourth plurality of packed data elements, each of the third and fourth plurality of packed data elements to be encoded with fewer bits than each of the first and second plurality of packed data elements; execution circuitry to execute the decoded instruction, the execution circuitry comprising: down-conversion circuitry to down-convert each of the first plurality of packed data elements to generate one of the third plurality of packed data elements and to down-convert each of the second plurality of packed data elements to generate one of the fourth plurality of packed data elements; interleave circuitry to interleave the third plurality of packed data elements with the fourth plurality of packed data elements within the destination register.

10.

发明公开
BFLOAT16 COMPARISON INSTRUCTIONS 审中-实审

公开(公告)号：EP4357914A2

公开(公告)日：2024-04-24

申请号：EP24161663.0

申请日：2022-07-08

Applicant: INTEL Corporation

Inventor： Heinecke, Alexander , Adelman, Menachem , Valentine, Robert , Sperber, Zeev , Gradstein, Amit , Charney, Mark , Georganas, Evangelos , Kalamkar, Dhiraj , Hughes, Christopher , Anderson, Cristina

IPC: G06F9/30

CPC classification number: G06F9/30036 , G06F9/30021 , G06F9/30094

Abstract: Techniques for comparing BF16 data elements are described. An exemplary instruction is to cause operations including to: provide, for each data element position of BF16 data elements of first and second packed data source operands, a data element result, wherein: for a predicate value that is a first value, the data element result is to include a corresponding data element that is a result of either a maximum comparison or a minimum comparison of a pair of corresponding BF16 data elements, wherein, when the BF16 data elements of the pair of corresponding BF16 data elements are both zero, of either sign, the data element result is to include the corresponding BF16 data element of the second packed data source operand; and for a predicate value that is a second value, the data element result is to include a corresponding data element that is either zero or remains unchanged.

Search Results

Country/Region

Patent validity

Application date

Publication (announcement) day

applicant

The country/region where the applicant is located

Inventor

IPC

IPC Department

IPC class

IPC subclass

IPC group

IPC team

Appearance classification