Patent search ap:("INTEL CORPORATION") AND inv:"Raanan SADE" Page 4

31.

发明申请
SYSTEMS AND METHODS TO ACCELERATE MULTIPLICATION OF SPARSE MATRICES 审中-公开

公开(公告)号：US20200210517A1

公开(公告)日：2020-07-02

申请号：US16234374

申请日：2018-12-27

Applicant: Intel Corporation

Inventor： Dan BAUM , Chen KOREN , Elmoustapha OULD-AHMED-VALL , Michael ESPIG , Christopher J. HUGHES , Raanan SADE , Robert VALENTINE , Mark J. CHARNEY , Alexander F. HEINECKE

IPC: G06F17/16 , G06F9/38 , G06F9/30

Abstract: Disclosed embodiments relate to accelerating multiplication of sparse matrices. In one example, a processor is to fetch and decode an instruction having fields to specify locations of first, second, and third matrices, and an opcode indicating the processor is to multiply and accumulate matching non-zero (NZ) elements of the first and second matrices with corresponding elements of the third matrix, and executing the decoded instruction as per the opcode to generate NZ bitmasks for the first and second matrices, broadcast up to two NZ elements at a time from each row of the first matrix and each column of the second matrix to a processing engine (PE) grid, each PE to multiply and accumulate matching NZ elements of the first and second matrices with corresponding elements of the third matrix. Each PE further to store an NZ element for use in a subsequent multiplications.

32.

发明申请
SYSTEMS AND METHODS FOR PERFORMING INSTRUCTIONS TO CONVERT TO 16-BIT FLOATING-POINT FORMAT 审中-公开

公开(公告)号：US20190079762A1

公开(公告)日：2019-03-14

申请号：US16186384

申请日：2018-11-09

Applicant: Intel Corporation

Inventor： Alexander F. HEINECKE , Robert VALENTINE , Mark J. CHARNEY , Raanan SADE , Menachem ADELMAN , Zeev SPERBER , Amit GRADSTEIN , Simon RUBANOVICH

IPC: G06F9/30 , G06F9/38

Abstract: Disclosed embodiments relate to systems and methods for performing instructions to convert to 16-bit floating-point format. In one example, a processor includes fetch circuitry to fetch an instruction having fields to specify an opcode and locations of a first source vector comprising N single-precision elements, and a destination vector comprising at least N 16-bit floating-point elements, the opcode to indicate execution circuitry is to convert each of the elements of the specified source vector to 16-bit floating-point, the conversion to include truncation and rounding, as necessary, and to store each converted element into a corresponding location of the specified destination vector, decode circuitry to decode the fetched instruction, and execution circuitry to respond to the decoded instruction as specified by the opcode.

33.

发明申请
SYSTEMS, METHODS, AND APPARATUSES UTILIZING CPU STORAGE WITH A MEMORY REFERENCE 审中-公开

公开(公告)号：US20190042448A1

公开(公告)日：2019-02-07

申请号：US15853640

申请日：2017-12-22

Applicant: Intel Corporation

Inventor： Raanan SADE , Jason BRANDT , Mark J. CHARNEY , Joseph NUZMAN , Leena PUTHIYEDATH , Rinat RAPPOPORT , Vivekananthan SANJEEPAN , Robert VALENTINE

IPC: G06F12/0877 , G06F12/0846 , G06F12/0813 , G06F12/0895 , G06F9/30

Abstract: Implementations of using tiles for caching are detailed In some implementations, an instruction execution circuitry executes one or more instructions, a register state cache coupled to the instruction execution circuitry holds thread register state in a plurality of registers, and backing storage pointer storage stores a backing storage pointer, wherein the backing storage pointer is to reference a state backing storage area in external memory to store the thread register state stored in the register state cache.

34.

发明申请
SYSTEMS AND METHODS FOR PERFORMING MATRIX COMPRESS AND DECOMPRESS INSTRUCTIONS 审中-公开

公开(公告)号：US20190042257A1

公开(公告)日：2019-02-07

申请号：US16144902

申请日：2018-09-27

Applicant: Intel Corporation

Inventor： Dan BAUM , Michael ESPIG , James GUILFORD , Wajdi K. FEGHALI , Raanan SADE , Christopher J. HUGHES , Robert VALENTINE , Bret TOLL , Elmoustapha OULD-AHMED-VALL , Mark J. CHARNEY , Vinodh GOPAL , Ronen ZOHAR , Alexander F. HEINECKE

IPC: G06F9/30 , G06F9/38

Abstract: Disclosed embodiments relate to matrix compress/decompress instructions. In one example, a processor includes fetch circuitry to fetch a compress instruction having a format with fields to specify an opcode and locations of decompressed source and compressed destination matrices, decode circuitry to decode the fetched compress instructions, and execution circuitry, responsive to the decoded compress instruction, to: generate a compressed result according to a compress algorithm by compressing the specified decompressed source matrix by either packing non-zero-valued elements together and storing the matrix position of each non-zero-valued element in a header, or using fewer bits to represent one or more elements and using the header to identify matrix elements being represented by fewer bits; and store the compressed result to the specified compressed destination matrix.

35.

发明申请
SYSTEMS AND METHODS FOR PERFORMING INSTRUCTIONS TO TRANSPOSE RECTANGULAR TILES 审中-公开

公开(公告)号：US20190042202A1

公开(公告)日：2019-02-07

申请号：US16144889

申请日：2018-09-27

Applicant: Intel Corporation

Inventor： Raanan SADE , Robert VALENTINE , Mark J. CHARNEY , Simon RUBANOVICH , Amit GRADSTEIN , Zeev SPERBER , Bret TOLL , Jesus CORBAL , Christopher J. HUGHES , Alexander F. HEINECKE , Elmoustapha OULD-AHMED-VALL

IPC: G06F7/78 , G06F9/30 , G06F9/38 , G06F15/173

Abstract: Disclosed embodiments relate to systems and methods for performing instructions to transpose rectangular tiles. In one example, a processor includes fetch circuitry to fetch an instruction having fields to specify an opcode and locations of first destination, second destination, first source, and second source matrices, the specified opcode to cause the processor to process each of the specified source and destination matrices as a rectangular matrix, decode circuitry to decode the fetched rectangular matrix transpose instruction, and execution circuitry to respond to the decoded rectangular matrix transpose instruction by transposing each row of elements of the specified first source matrix into a corresponding column of the specified first destination matrix and transposing each row of elements of the specified second source matrix into a corresponding column of the specified second destination matrix.

36.

发明公开
SYSTEMS, METHODS, AND APPARATUS FOR TILE CONFIGURATION 审中-公开

公开(公告)号：US20240111533A1

公开(公告)日：2024-04-04

申请号：US18534012

申请日：2023-12-08

Applicant: Intel Corporation

Inventor： Menachem ADELMAN , Robert VALENTINE , Zeev SPERBER , Mark J. CHARNEY , Bret L. TOLL , Rinat RAPPOPORT , Jesus CORBAL , Dan BAUM , Alexander F. HEINECKE , Elmoustaha OULD-AHMED-VALL , Yuri GEBIL , Raanan SADE

IPC: G06F9/30 , G06F7/485 , G06F7/487 , G06F7/76 , G06F9/38 , G06F17/16

CPC classification number: G06F9/30036 , G06F7/485 , G06F7/4876 , G06F7/762 , G06F9/3001 , G06F9/30032 , G06F9/30043 , G06F9/30109 , G06F9/30112 , G06F9/30134 , G06F9/30145 , G06F9/30149 , G06F9/3016 , G06F9/30185 , G06F9/30196 , G06F9/3818 , G06F9/3836 , G06F17/16 , G06F2212/454

Abstract: Embodiments detailed herein relate to matrix (tile) operations. For example, decode circuitry to decode an instruction having fields for an opcode and a memory address; and execution circuitry to execute the decoded instruction to set a tile configuration for the processor to utilize tiles in matrix operations based on a description retrieved from the memory address, wherein a tile a set of 2-dimensional registers are discussed.

37.

发明公开
APPARATUSES, METHODS, AND SYSTEMS TO PRECISELY MONITOR MEMORY STORE ACCESSES 审中-公开

公开(公告)号：US20230176870A1

公开(公告)日：2023-06-08

申请号：US18160600

申请日：2023-01-27

Applicant: Intel Corporation

Inventor： Ahmad YASIN , Raanan SADE , Liron ZUR , Igor YANOVER , Joseph NUZMAN

IPC: G06F9/30 , G06F9/54 , G06F11/34 , G06F11/30

CPC classification number: G06F9/30145 , G06F9/544 , G06F11/348 , G06F9/546 , G06F11/3037 , G06F9/30098

Abstract: Systems, methods, and apparatuses relating to circuitry to precisely monitor memory store accesses are described. In one embodiment, a system includes a memory, a hardware processor core comprising a decoder to decode an instruction into a decoded instruction, an execution circuit to execute the decoded instruction to produce a resultant, a store buffer, and a retirement circuit to retire the instruction when a store request for the resultant from the execution circuit is queued into the store buffer for storage into the memory, and a performance monitoring circuit to mark the retired instruction for monitoring of post-retirement performance information between being queued in the store buffer and being stored in the memory, enable a store fence after the retired instruction to be inserted that causes previous store requests to complete within the memory, and on detection of completion of the store request for the instruction in the memory, store the post-retirement performance information in storage of the performance monitoring circuit.

38.

发明申请
SYSTEMS AND METHODS FOR PERFORMING 16-BIT FLOATING-POINT VECTOR DOT PRODUCT INSTRUCTIONS 有权

公开(公告)号：US20220326949A1

公开(公告)日：2022-10-13

申请号：US17845103

申请日：2022-06-21

Applicant: Intel Corporation

Inventor： Alexander F. HEINECKE , Robert VALENTINE , Mark J. CHARNEY , Raanan SADE , Menachem ADELMAN , Zeev SPERBER , Amit GRADSTEIN , Simon RUBANOVICH

IPC: G06F9/30 , G06F9/38

Abstract: Disclosed embodiments relate to systems and methods for performing 16-bit floating-point vector dot product instructions. In one example, a processor includes fetch circuitry to fetch an instruction having fields to specify an opcode and locations of first source, second source, and destination vectors, the opcode to indicate execution circuitry is to multiply N pairs of 16-bit floating-point formatted elements of the specified first and second sources, and accumulate the resulting products with previous contents of a corresponding single-precision element of the specified destination, decode circuitry to decode the fetched instruction, and execution circuitry to respond to the decoded instruction as specified by the opcode.

39.

发明申请
SYSTEMS FOR PERFORMING INSTRUCTIONS TO QUICKLY CONVERT AND USE TILES AS 1D VECTORS 有权

公开(公告)号：US20210318874A1

公开(公告)日：2021-10-14

申请号：US17240882

申请日：2021-04-26

Applicant: INTEL CORPORATION

Inventor： Bret TOLL , Christopher J. HUGHES , Dan BAUM , Elmoustapha OULD-AHMED-VALL , Raanan SADE , Robert VALENTINE , Mark J. CHARNEY , Alexander F. HEINECKE

IPC: G06F9/30

Abstract: Disclosed embodiments relate to systems for performing instructions to quickly convert and use matrices (tiles) as one-dimensional vectors. In one example, a processor includes fetch circuitry to fetch an instruction having fields to specify an opcode, locations of a two-dimensional (2D) matrix and a one-dimensional (1D) vector, and a group of elements comprising one of a row, part of a row, multiple rows, a column, part of a column, multiple columns, and a rectangular sub-tile of the specified 2D matrix, and wherein the opcode is to indicate a move of the specified group between the 2D matrix and the 1D vector, decode circuitry to decode the fetched instruction; and execution circuitry, responsive to the decoded instruction, when the opcode specifies a move from 1D, to move contents of the specified 1D vector to the specified group of elements.

40.

发明申请
SYSTEMS FOR PERFORMING INSTRUCTIONS FOR FAST ELEMENT UNPACKING INTO 2-DIMENSIONAL REGISTERS 有权

公开(公告)号：US20210216315A1

公开(公告)日：2021-07-15

申请号：US17152160

申请日：2021-01-19

Applicant: INTEL CORPORATION

Inventor： Bret TOLL , Alexander F. HEINECKE , Christopher J. HUGHES , Ronen ZOHAR , Michael ESPIG , Dan BAUM , Raanan SADE , Robert VALENTINE , Mark J. CHARNEY , Elmoustapha OULD-AHMED-VALL

IPC: G06F9/30 , G06F12/06 , G06F12/02 , G06F9/38 , G06T1/20

Abstract: Disclosed embodiments relate to instructions for fast element unpacking. In one example, a processor includes fetch circuitry to fetch an instruction whose format includes fields to specify an opcode and locations of an Array-of-Structures (AOS) source matrix and one or more Structure of Arrays (SOA) destination matrices, wherein: the specified opcode calls for unpacking elements of the specified AOS source matrix into the specified Structure of Arrays (SOA) destination matrices, the AOS source matrix is to contain N structures each containing K elements of different types, with same-typed elements in consecutive structures separated by a stride, the SOA destination matrices together contain K segregated groups, each containing N same-typed elements, decode circuitry to decode the fetched instruction, and execution circuitry, responsive to the decoded instruction, to unpack each element of the specified AOS matrix into one of the K element types of the one or more SOA matrices.

Search Results

Country/Region

Patent validity

Application date

Publication (announcement) day

applicant

The country/region where the applicant is located

Inventor

IPC

IPC Department

IPC class

IPC subclass

IPC group

IPC team

Appearance classification