Patent search ap:("INTEL CORPORATION") AND inv:"Raanan Sade" Page 5

41.

发明公开
SYSTEMS AND METHODS TO STORE A TILE REGISTER PAIR TO MEMORY 审中-公开

公开(公告)号：US20240143328A1

公开(公告)日：2024-05-02

申请号：US18386407

申请日：2023-11-02

Applicant: Intel Corporation

Inventor： Raanan Sade , Simon Rubanovich , Amit Gradstein , Zeev Sperber , Alexander Heinecke , Robert Valentine , Mark J. Charney , Bret Toll , Jesus Corbal , Elmoustapha Ould-Ahmed-Vall , Menachem Adelman

IPC: G06F9/30

CPC classification number: G06F9/30145 , G06F9/30036 , G06F9/30043

Abstract: Embodiments detailed herein relate to systems and methods to store a tile register pair to memory. In one example, a processor includes: decode circuitry to decode a store matrix pair instruction having fields for an opcode and source and destination identifiers to identify source and destination matrices, respectively, each matrix having a PAIR parameter equal to TRUE; and execution circuitry to execute the decoded store matrix pair instruction to store every element of left and right tiles of the identified source matrix to corresponding element positions of left and right tiles of the identified destination matrix, respectively, wherein the executing stores a chunk of C elements of one row of the identified source matrix at a time.

42.

发明公开
SYSTEMS, METHODS, AND APPARATUSES FOR MATRIX OPERATIONS 审中-公开

公开(公告)号：US20240143325A1

公开(公告)日：2024-05-02

申请号：US18386771

申请日：2023-11-03

Applicant: Intel Corporation

Inventor： Raanan Sade , Simon Rubanovich , Amit Gradstein , Zeev Sperber , Alexander Heinecke , Robert Valentine , Mark J. Charney , Bret Toll , Jesus Corbal , Elmoustapha Ould-Ahmed-Vall , Menachem Adelman

IPC: G06F9/30 , G06F17/16

CPC classification number: G06F9/30036 , G06F9/30101 , G06F17/16

Abstract: Embodiments detailed herein relate to matrix (tile) operations. For example, decode circuitry to decode an instruction having fields for an opcode and a memory address, and execution circuitry to execute the decoded instruction to store configuration information about usage of storage for two-dimensional data structures at the memory address.

43.

发明授权
Apparatus and method for complex multiplication 有权

公开(公告)号：US11960884B2

公开(公告)日：2024-04-16

申请号：US17517351

申请日：2021-11-02

Applicant: Intel Corporation

Inventor： Robert Valentine , Mark Charney , Raanan Sade , Elmoustapha Ould-Ahmed-Vall , Jesus Corbal , Roman S. Dubtsov

IPC: G06F9/30 , G06F7/48 , G06F9/38 , G06F17/10

CPC classification number: G06F9/3001 , G06F7/4812 , G06F9/30014 , G06F9/30109 , G06F9/3013 , G06F9/3016 , G06F7/4806 , G06F9/30167 , G06F9/382 , G06F9/3824 , G06F17/10

Abstract: An embodiment of the invention is a processor including execution circuitry to calculate, in response to a decoded instruction, a result of a complex multiplication of a first complex number and a second complex number. The calculation includes a first operation to calculate a first term of a real component of the result and a first term of the imaginary component of the result. The calculation also includes a second operation to calculate a second term of the real component of the result and a second term of the imaginary component of the result. The processor also includes a decoder, a first source register, and a second source register. The decoder is to decode an instruction to generate the decoded instruction. The first source register is to provide the first complex number and the second source register is to provide the second complex number.

44.

发明授权
Systems and methods for performing nibble-sized operations on matrix elements 有权

公开(公告)号：US11886875B2

公开(公告)日：2024-01-30

申请号：US16232599

申请日：2018-12-26

Applicant: Intel Corporation

Inventor： Elmoustapha Ould-Ahmed-Vall , Jonathan D. Pearce , Dan Baum , Guei-Yuan Lueh , Michael Espig , Christopher J. Hughes , Raanan Sade , Robert Valentine , Mark J. Charney , Alexander F. Heinecke

IPC: G06F9/30

CPC classification number: G06F9/30036 , G06F9/3001 , G06F9/30018 , G06F9/30038

Abstract: Disclosed embodiments relate to systems and methods for performing nibble-sized operations on matrix elements. In one example, a processor includes fetch circuitry to fetch an instruction, decode circuitry to decode the fetched instruction the fetched instruction having fields to specify an opcode and locations of first source, second source, and destination matrices, the opcode to indicate the processor is to, for each pair of corresponding elements of the first and second source matrices, logically partition each element into nibble-sized partitions, perform an operation indicated by the instruction on each partition, and store execution results to a corresponding nibble-sized partition of a corresponding element of the destination matrix. The exemplary processor includes execution circuitry to execute the decoded instruction as per the opcode.

45.

发明授权
Systems and methods for performing matrix compress and decompress instructions 有权

公开(公告)号：US11748103B2

公开(公告)日：2023-09-05

申请号：US17672253

申请日：2022-02-15

Applicant: Intel Corporation

Inventor： Dan Baum , Michael Espig , James Guilford , Wajdi K. Feghali , Raanan Sade , Christopher J. Hughes , Robert Valentine , Bret Toll , Elmoustapha Ould-Ahmed-Vall , Mark J. Charney , Vinodh Gopal , Ronen Zohar , Alexander F. Heinecke

IPC: G06F9/30 , G06F9/38

CPC classification number: G06F9/30178 , G06F9/3013 , G06F9/30036 , G06F9/30145 , G06F9/3802

Abstract: Disclosed embodiments relate to matrix compress/decompress instructions. In one example, a processor includes fetch circuitry to fetch a compress instruction having a format with fields to specify an opcode and locations of decompressed source and compressed destination matrices, decode circuitry to decode the fetched compress instructions, and execution circuitry, responsive to the decoded compress instruction, to: generate a compressed result according to a compress algorithm by compressing the specified decompressed source matrix by either packing non-zero-valued elements together and storing the matrix position of each non-zero-valued element in a header, or using fewer bits to represent one or more elements and using the header to identify matrix elements being represented by fewer bits; and store the compressed result to the specified compressed destination matrix.

46.

发明授权
Systems for performing instructions to quickly convert and use tiles as 1D vectors 有权

公开(公告)号：US11579880B2

公开(公告)日：2023-02-14

申请号：US17240882

申请日：2021-04-26

Applicant: INTEL CORPORATION

Inventor： Bret Toll , Christopher J. Hughes , Dan Baum , Elmoustapha Ould-Ahmed-Vall , Raanan Sade , Robert Valentine , Mark J. Charney , Alexander F. Heinecke

IPC: G06F9/30

Abstract: Disclosed embodiments relate to systems for performing instructions to quickly convert and use matrices (tiles) as one-dimensional vectors. In one example, a processor includes fetch circuitry to fetch an instruction having fields to specify an opcode, locations of a two-dimensional (2D) matrix and a one-dimensional (1D) vector, and a group of elements comprising one of a row, part of a row, multiple rows, a column, part of a column, multiple columns, and a rectangular sub-tile of the specified 2D matrix, and wherein the opcode is to indicate a move of the specified group between the 2D matrix and the 1D vector, decode circuitry to decode the fetched instruction; and execution circuitry, responsive to the decoded instruction, when the opcode specifies a move from 1D, to move contents of the specified 1D vector to the specified group of elements.

47.

发明授权
Systems for performing instructions for fast element unpacking into 2-dimensional registers 有权

公开(公告)号：US11507376B2

公开(公告)日：2022-11-22

申请号：US17152160

申请日：2021-01-19

Applicant: INTEL CORPORATION

Inventor： Bret Toll , Alexander F. Heinecke , Christopher J. Hughes , Ronen Zohar , Michael Espig , Dan Baum , Raanan Sade , Robert Valentine , Mark J. Charney , Elmoustapha Ould-Ahmed-Vall

IPC: G06F17/16 , G06F12/02 , G06F9/30 , G06F12/06 , G06F9/38 , G06T1/20 , G06F3/06 , G06F12/0897 , G06F12/0875 , G06F9/345

Abstract: Disclosed embodiments relate to instructions for fast element unpacking. In one example, a processor includes fetch circuitry to fetch an instruction whose format includes fields to specify an opcode and locations of an Array-of-Structures (AOS) source matrix and one or more Structure of Arrays (SOA) destination matrices, wherein: the specified opcode calls for unpacking elements of the specified AOS source matrix into the specified Structure of Arrays (SOA) destination matrices, the AOS source matrix is to contain N structures each containing K elements of different types, with same-typed elements in consecutive structures separated by a stride, the SOA destination matrices together contain K segregated groups, each containing N same-typed elements, decode circuitry to decode the fetched instruction, and execution circuitry, responsive to the decoded instruction, to unpack each element of the specified AOS matrix into one of the K element types of the one or more SOA matrices.

48.

发明授权
Efficient implementation of complex vector fused multiply add and complex vector multiply 有权

公开(公告)号：US11455167B2

公开(公告)日：2022-09-27

申请号：US16701082

申请日：2019-12-02

Applicant: Intel Corporation

Inventor： Raanan Sade , Thierry Pons , Amit Gradstein , Zeev Sperber , Mark J. Charney , Robert Valentine , Eyal Oz-Sinay

IPC: G06F9/30 , G06F9/38 , G06F17/16

Abstract: Disclosed embodiments relate to efficient complex vector multiplication. In one example, an apparatus includes execution circuitry, responsive to an instruction having fields to specify multiplier, multiplicand, and summand complex vectors, to perform two operations: first, to generate a double-even multiplicand by duplicating even elements of the specified multiplicand, and to generate a temporary vector using a fused multiply-add (FMA) circuit having A, B, and C inputs set to the specified multiplier, the double-even multiplicand, and the specified summand, respectively, and second, to generate a double-odd multiplicand by duplicating odd elements of the specified multiplicand, to generate a swapped multiplier by swapping even and odd elements of the specified multiplier, and to generate a result using a second FMA circuit having its even product negated, and having A, B, and C inputs set to the swapped multiplier, the double-odd multiplicand, and the temporary vector, respectively.

49.

发明授权
Systems for performing instructions to quickly convert and use tiles as 1D vectors 有权

公开(公告)号：US10990396B2

公开(公告)日：2021-04-27

申请号：US16145066

申请日：2018-09-27

Applicant: Intel Corporation

Inventor： Bret Toll , Christopher J. Hughes , Dan Baum , Elmoustapha Ould-Ahmed-Vall , Raanan Sade , Robert Valentine , Mark J. Charney , Alexander F. Heinecke

IPC: G06F9/30

Abstract: Disclosed embodiments relate to systems for performing instructions to quickly convert and use matrices (tiles) as one-dimensional vectors. In one example, a processor includes fetch circuitry to fetch an instruction having fields to specify an opcode, locations of a two-dimensional (2D) matrix and a one-dimensional (1D) vector, and a group of elements comprising one of a row, part of a row, multiple rows, a column, part of a column, multiple columns, and a rectangular sub-tile of the specified 2D matrix, and wherein the opcode is to indicate a move of the specified group between the 2D matrix and the 1D vector, decode circuitry to decode the fetched instruction; and execution circuitry, responsive to the decoded instruction, when the opcode specifies a move from 1D, to move contents of the specified 1D vector to the specified group of elements.

50.

发明授权
Systems and methods to transpose vectors on-the-fly while loading from memory 有权

公开(公告)号：US10970072B2

公开(公告)日：2021-04-06

申请号：US16231050

申请日：2018-12-21

Applicant: Intel Corporation

Inventor： Alexander F. Heinecke , Evangelos Georganas , Christopher J. Hughes , Raanan Sade , Robert Valentine

IPC: G06F9/30

Abstract: Disclosed embodiments relate to transposing vectors while loading from memory. In one example, a processor includes a register file, a memory interface, fetch circuitry to fetch an instruction, decode circuitry to decode the fetched instruction having fields to specify an opcode, a destination vector register, and a source vector having N groups of elements, N being a positive integer, the opcode to indicate the processor is to fetch the source vector, generate write data comprising one or more N-tuples, each N-tuple comprising corresponding elements from each of the N groups of elements, and write the write data to the destination vector register, and execution circuitry to execute the decoded instruction as per the opcode, the execution circuitry has a shuffle pipeline disposed between the memory and the register file, the shuffle pipeline to fetch, decode, and execute further instances of the instruction at one instruction per clock cycle.

Search Results

Country/Region

Patent validity

Application date

Publication (announcement) day

applicant

The country/region where the applicant is located

Inventor

IPC

IPC Department

IPC class

IPC subclass

IPC group

IPC team

Appearance classification