-
公开(公告)号:US20220197635A1
公开(公告)日:2022-06-23
申请号:US17132464
申请日:2020-12-23
Applicant: Intel Corporation
Inventor: Deepti AGGARWAL , Michael ESPIG , Chekib NOUIRA , Robert VALENTINE , Mark CHARNEY
Abstract: In an embodiment, a processor includes: a fetch circuit to fetch instructions, the instructions including a sum of squared differences (SSD) instruction; a decode circuit to decode the SSD instruction; and an execution circuit to, during an execution of the decoded SSD instruction, generate an SSD output vector based on a plurality of input vectors, the SSD output vector including a plurality of squared differences values. Other embodiments are described and claimed.
-
公开(公告)号:US20240045690A1
公开(公告)日:2024-02-08
申请号:US18460497
申请日:2023-09-01
Applicant: Intel Corporation
Inventor: Dan BAUM , Michael ESPIG , James GUILFORD , Wajdi K. FEGHALI , Raanan SADE , Christopher J. HUGHES , Robert VALENTINE , Bret TOLL , Elmoustapha OULD-AHMED-VALL , Mark J. CHARNEY , Vinodh GOPAL , Ronen ZOHAR , Alexander F. HEINECKE
CPC classification number: G06F9/30178 , G06F9/30145 , G06F9/30036 , G06F9/3013 , G06F9/3802
Abstract: Disclosed embodiments relate to matrix compress/decompress instructions. In one example, a processor includes fetch circuitry to fetch a compress instruction having a format with fields to specify an opcode and locations of decompressed source and compressed destination matrices, decode circuitry to decode the fetched compress instructions, and execution circuitry, responsive to the decoded compress instruction, to: generate a compressed result according to a compress algorithm by compressing the specified decompressed source matrix by either packing non-zero-valued elements together and storing the matrix position of each non-zero-valued element in a header, or using fewer bits to represent one or more elements and using the header to identify matrix elements being represented by fewer bits; and store the compressed result to the specified compressed destination matrix.
-
公开(公告)号:US20240004648A1
公开(公告)日:2024-01-04
申请号:US17856981
申请日:2022-07-02
Applicant: Intel Corporation
Inventor: Venkateswara Rao MADDURI , Jason BRANDT , Jeff WIEDEMEIER , Michael ESPIG
IPC: G06F9/30
CPC classification number: G06F9/30036 , G06F9/30185 , G06F9/30098
Abstract: Techniques for vector unpacking are described. In some examples a single instruction is executed to perform vector unpacking. In some examples the instruction is to include one or more fields for an opcode, a destination operand identifier, a first source operand identifier, a second source operand identifier, and an immediate, wherein the opcode is to indicate execution circuitry is to interleave data elements from the identified first and second source operands according to an encoding of the immediate wherein the encoding of the immediate to include multiple controls with each control dictating what is to be written into a particular data element position of the identified destination operand;
-
公开(公告)号:US20200210517A1
公开(公告)日:2020-07-02
申请号:US16234374
申请日:2018-12-27
Applicant: Intel Corporation
Inventor: Dan BAUM , Chen KOREN , Elmoustapha OULD-AHMED-VALL , Michael ESPIG , Christopher J. HUGHES , Raanan SADE , Robert VALENTINE , Mark J. CHARNEY , Alexander F. HEINECKE
Abstract: Disclosed embodiments relate to accelerating multiplication of sparse matrices. In one example, a processor is to fetch and decode an instruction having fields to specify locations of first, second, and third matrices, and an opcode indicating the processor is to multiply and accumulate matching non-zero (NZ) elements of the first and second matrices with corresponding elements of the third matrix, and executing the decoded instruction as per the opcode to generate NZ bitmasks for the first and second matrices, broadcast up to two NZ elements at a time from each row of the first matrix and each column of the second matrix to a processing engine (PE) grid, each PE to multiply and accumulate matching NZ elements of the first and second matrices with corresponding elements of the third matrix. Each PE further to store an NZ element for use in a subsequent multiplications.
-
公开(公告)号:US20190042257A1
公开(公告)日:2019-02-07
申请号:US16144902
申请日:2018-09-27
Applicant: Intel Corporation
Inventor: Dan BAUM , Michael ESPIG , James GUILFORD , Wajdi K. FEGHALI , Raanan SADE , Christopher J. HUGHES , Robert VALENTINE , Bret TOLL , Elmoustapha OULD-AHMED-VALL , Mark J. CHARNEY , Vinodh GOPAL , Ronen ZOHAR , Alexander F. HEINECKE
Abstract: Disclosed embodiments relate to matrix compress/decompress instructions. In one example, a processor includes fetch circuitry to fetch a compress instruction having a format with fields to specify an opcode and locations of decompressed source and compressed destination matrices, decode circuitry to decode the fetched compress instructions, and execution circuitry, responsive to the decoded compress instruction, to: generate a compressed result according to a compress algorithm by compressing the specified decompressed source matrix by either packing non-zero-valued elements together and storing the matrix position of each non-zero-valued element in a header, or using fewer bits to represent one or more elements and using the header to identify matrix elements being represented by fewer bits; and store the compressed result to the specified compressed destination matrix.
-
公开(公告)号:US20250028533A1
公开(公告)日:2025-01-23
申请号:US18224919
申请日:2023-07-21
Applicant: Intel Corporation
Inventor: John MORGAN , Michael ESPIG , Deepti AGGARWAL
IPC: G06F9/30
Abstract: Techniques for zero clearing scalar moves are described. For example, one or more instructions are supported which, when executed, are to cause a scalar move of a 16-bit or 32-bit floating-point value from a source to a destination. When the destination is a vector register, all other data elements are to be zeroed.
-
7.
公开(公告)号:US20210216315A1
公开(公告)日:2021-07-15
申请号:US17152160
申请日:2021-01-19
Applicant: INTEL CORPORATION
Inventor: Bret TOLL , Alexander F. HEINECKE , Christopher J. HUGHES , Ronen ZOHAR , Michael ESPIG , Dan BAUM , Raanan SADE , Robert VALENTINE , Mark J. CHARNEY , Elmoustapha OULD-AHMED-VALL
Abstract: Disclosed embodiments relate to instructions for fast element unpacking. In one example, a processor includes fetch circuitry to fetch an instruction whose format includes fields to specify an opcode and locations of an Array-of-Structures (AOS) source matrix and one or more Structure of Arrays (SOA) destination matrices, wherein: the specified opcode calls for unpacking elements of the specified AOS source matrix into the specified Structure of Arrays (SOA) destination matrices, the AOS source matrix is to contain N structures each containing K elements of different types, with same-typed elements in consecutive structures separated by a stride, the SOA destination matrices together contain K segregated groups, each containing N same-typed elements, decode circuitry to decode the fetched instruction, and execution circuitry, responsive to the decoded instruction, to unpack each element of the specified AOS matrix into one of the K element types of the one or more SOA matrices.
-
公开(公告)号:US20200210188A1
公开(公告)日:2020-07-02
申请号:US16233546
申请日:2018-12-27
Applicant: Intel Corporation
Inventor: Elmoustapha OULD-AHMED-VALL , Jonathan D. PEARCE , Dan BAUM , Guei-Yuan LUEH , Michael ESPIG , Christopher J. HUGHES , Raanan SADE , Robert VALENTINE , Mark J. CHARNEY , Alexander F. HEINECKE
Abstract: Disclosed embodiments relate to systems and methods for performing matrix row-wise and column-wise permute instructions. In one example, a processor includes fetch circuitry to fetch an instruction, decoding, using decode circuitry, the fetched instruction having fields to specify an opcode and locations of a source matrix and a destination matrix, the opcode indicating the processor is to perform a permutation by copying, into each of a plurality of equal-sized logical partitions of the destination matrix, a selected logical partition of a same size from the source matrix, the selection being indicated by a permute control, and execution circuitry to execute the decoded instruction as per the opcode.
-
公开(公告)号:US20200210173A1
公开(公告)日:2020-07-02
申请号:US16232599
申请日:2018-12-26
Applicant: Intel Corporation
Inventor: Elmoustapha OULD-AHMED-VALL , Jonathan D. PEARCE , Dan BAUM , Guei-Yuan LUEH , Michael ESPIG , Christopher J. HUGHES , Raanan SADE , Robert VALENTINE , Mark J. CHARNEY , Alexander F. HEINECKE
IPC: G06F9/30
Abstract: Disclosed embodiments relate to systems and methods for performing nibble-sized operations on matrix elements. In one example, a processor includes fetch circuitry to fetch an instruction, decode circuitry to decode the fetched instruction the fetched instruction having fields to specify an opcode and locations of first source, second source, and destination matrices, the opcode to indicate the processor is to, for each pair of corresponding elements of the first and second source matrices, logically partition each element into nibble-sized partitions, perform an operation indicated by the instruction on each partition, and store execution results to a corresponding nibble-sized partition of a corresponding element of the destination matrix. The exemplary processor includes execution circuitry to execute the decoded instruction as per the opcode.
-
公开(公告)号:US20240220248A1
公开(公告)日:2024-07-04
申请号:US18091318
申请日:2022-12-29
Applicant: Intel Corporation
Inventor: Vivekananthan SANJEEPAN , Gilbert NEIGER , Michael ESPIG
IPC: G06F9/30
CPC classification number: G06F9/30036 , G06F9/30181
Abstract: Techniques to restrict vector length in a processor are described. A method of an aspect that may be performed by a processor includes executing first instances of vector instructions having respective opcode values regardless of whether they specify wider vectors of a wider vector width or narrower vectors of a narrower vector width, when a control value is a first value. The method also includes executing second instances of vector instructions having the respective opcode values when they specify narrower vectors of the narrower vector width, but do not specify wider vectors of the wider vector width, when the control value is a second different value. The method also includes preventing execution of third instances of vector instructions having the respective opcode values when they specify wider vectors of the wider vector width, when the control value is the second value. Other methods, processors, and systems are disclosed.
-
-
-
-
-
-
-
-
-