-
公开(公告)号:US20240004662A1
公开(公告)日:2024-01-04
申请号:US17856978
申请日:2022-07-02
Applicant: Intel Corporation
Inventor: Menachem ADELMAN , Amit GRADSTEIN , Regev SHEMY , Chitra NATARAJAN , Leonardo BORGES , Chytra SHIVASWAMY , Igor ERMOLAEV , Michael ESPIG , Or BEIT AHARON , Jeff WIEDEMEIER
IPC: G06F9/30
CPC classification number: G06F9/30185 , G06F9/30025 , G06F9/30021
Abstract: Techniques for performing horizontal reductions are described. In some examples, an instance of a horizontal instruction is to include at least one field for an opcode, one or more fields to reference a first source operand, and one or more fields to reference a destination operand, wherein the opcode is to indicate that execution circuitry is, in response to a decoded instance of the single instruction, to at least perform a horizontal reduction using at least one data element of a non-masked data element position of at least the first source operand and store a result of the horizontal reduction in the destination operand.
-
公开(公告)号:US20220012305A1
公开(公告)日:2022-01-13
申请号:US17485055
申请日:2021-09-24
Applicant: Intel Corporation
Inventor: Dan BAUM , Chen KOREN , Elmoustapha OULD-AHMED-VALL , Michael ESPIG , Christopher J. HUGHES , Raanan SADE , Robert VALENTINE , Mark J. CHARNEY , Alexander F. HEINECKE
Abstract: Disclosed embodiments relate to accelerating multiplication of sparse matrices. In one example, a processor is to fetch and decode an instruction having fields to specify locations of first, second, and third matrices, and an opcode indicating the processor is to multiply and accumulate matching non-zero (NZ) elements of the first and second matrices with corresponding elements of the third matrix, and executing the decoded instruction as per the opcode to generate NZ bitmasks for the first and second matrices, broadcast up to two NZ elements at a time from each row of the first matrix and each column of the second matrix to a processing engine (PE) grid, each PE to multiply and accumulate matching NZ elements of the first and second matrices with corresponding elements of the third matrix. Each PE further to store an NZ element for use in a subsequent multiplications.
-
公开(公告)号:US20200348937A1
公开(公告)日:2020-11-05
申请号:US16934003
申请日:2020-07-20
Applicant: Intel Corporation
Inventor: Dan BAUM , Michael ESPIG , James GUILFORD , Wajdi K. FEGHALI , Raanan SADE , Christopher J. HUGHES , Robert VALENTINE , Bret TOLL , Elmoustapha OULD-AHMED-VALL , Mark J. CHARNEY , Vinodh GOPAL , Ronen ZOHAR , Alexander F. HEINECKE
Abstract: Disclosed embodiments relate to matrix compress/decompress instructions. In one example, a processor includes fetch circuitry to fetch a compress instruction having a format with fields to specify an opcode and locations of decompressed source and compressed destination matrices, decode circuitry to decode the fetched compress instructions, and execution circuitry, responsive to the decoded compress instruction, to: generate a compressed result according to a compress algorithm by compressing the specified decompressed source matrix by either packing non-zero-valued elements together and storing the matrix position of each non-zero-valued element in a header, or using fewer bits to represent one or more elements and using the header to identify matrix elements being represented by fewer bits; and store the compressed result to the specified compressed destination matrix.
-
14.
公开(公告)号:US20200210172A1
公开(公告)日:2020-07-02
申请号:US16233650
申请日:2018-12-27
Applicant: Intel Corporation
Inventor: Michael ESPIG , Matthew C. MERTEN , Sean MIRKES
Abstract: A system for processing data flow array instructions is described. The system includes a data flow array, which includes a plurality of processing elements; a decoder to receive a data flow array instruction and generate a set of microinstructions based on the data flow array instruction; a reservation station to receive and dispatch each microinstruction in the set of microinstructions, wherein the set of microinstructions includes a configuration microinstruction for configuring the data flow array for processing the data flow array instruction; a configuration watcher to receive the configuration microinstruction and to add a configuration identifier and a set of parameters of the configuration microinstruction to a configuration queue for the data flow array, wherein the data flow array is to configure the plurality of processing elements based on configuration information associated with the configuration identifier and the set of parameters.
-
公开(公告)号:US20250004764A1
公开(公告)日:2025-01-02
申请号:US18217544
申请日:2023-07-01
Applicant: Intel Corporation
Inventor: Michael ESPIG , Menachem ADELMAN , Jonathan COMBS , Amit GRADSTEIN , Christopher J. HUGHES , Vivekananthan SANJEEPAN , Wing Shek WONG
IPC: G06F9/30
Abstract: Techniques for providing 512-bit operands or smaller are described. In some examples, a prefix of an instruction is utilized to define the operand (vector) length. For example, an instruction is to at least include fields for a prefix, an opcode, and operand addressing information, wherein the prefix and addressing information are to be used by decoder circuitry to determine support for a particular a vector length for one or more operands of the instance of the single instruction and the opcode is to indicate one or more operations to perform on the one or more operands.
-
16.
公开(公告)号:US20240078285A1
公开(公告)日:2024-03-07
申请号:US18502291
申请日:2023-11-06
Applicant: Intel Corporation
Inventor: Dan BAUM , Chen KOREN , Elmoustapha OULD-AHMED-VALL , Michael ESPIG , Christopher J. HUGHES , Raanan SADE , Robert VALENTINE , Mark J. CHARNEY , Alexander F. HEINECKE
CPC classification number: G06F17/16 , G06F9/3001 , G06F9/30101 , G06F9/3016 , G06F9/3802
Abstract: Disclosed embodiments relate to accelerating multiplication of sparse matrices. In one example, a processor is to fetch and decode an instruction having fields to specify locations of first, second, and third matrices, and an opcode indicating the processor is to multiply and accumulate matching non-zero (NZ) elements of the first and second matrices with corresponding elements of the third matrix, and executing the decoded instruction as per the opcode to generate NZ bitmasks for the first and second matrices, broadcast up to two NZ elements at a time from each row of the first matrix and each column of the second matrix to a processing engine (PE) grid, each PE to multiply and accumulate matching NZ elements of the first and second matrices with corresponding elements of the third matrix. Each PE further to store an NZ element for use in a subsequent multiplications.
-
公开(公告)号:US20220171627A1
公开(公告)日:2022-06-02
申请号:US17672253
申请日:2022-02-15
Applicant: Intel Corporation
Inventor: Dan BAUM , Michael ESPIG , James GUILFORD , Wajdi K. FEGHALI , Raanan SADE , Christopher J. HUGHES , Robert VALENTINE , Bret TOLL , Elmoustapha OULD-AHMED-VALL , Mark J. CHARNEY , Vinodh GOPAL , Ronen ZOHAR , Alexander F. HEINECKE
Abstract: Disclosed embodiments relate to matrix compress/decompress instructions. In one example, a processor includes fetch circuitry to fetch a compress instruction having a format with fields to specify an opcode and locations of decompressed source and compressed destination matrices, decode circuitry to decode the fetched compress instructions, and execution circuitry, responsive to the decoded compress instruction, to: generate a compressed result according to a compress algorithm by compressing the specified decompressed source matrix by either packing non-zero-valued elements together and storing the matrix position of each non-zero-valued element in a header, or using fewer bits to represent one or more elements and using the header to identify matrix elements being represented by fewer bits; and store the compressed result to the specified compressed destination matrix.
-
18.
公开(公告)号:US20200334016A1
公开(公告)日:2020-10-22
申请号:US16919022
申请日:2020-07-01
Applicant: INTEL CORPORATION
Inventor: Aditya VARMA , Michael ESPIG
Abstract: An apparatus and method for efficiently performing a multiply add or multiply accumulate operation. For example, one embodiment of a processor comprises: a decoder to decode an instruction specifying an operation, the instruction comprising a first operand identifying a multiplier and a second operand identifying a multiplicand; and fused multiply-add (FMA) execution circuitry comprising first multiplication circuitry to perform a multiplication using the multiplicand and multiplier to generate a result for multipliers and multiplicands falling within a first precision range, and second multiplication circuitry to be used instead of the first multiplication circuitry for multipliers and multiplicands falling within a second precision range.
-
公开(公告)号:US20200210182A1
公开(公告)日:2020-07-02
申请号:US16232931
申请日:2018-12-26
Applicant: Intel Corporation
Inventor: Christopher J. HUGHES , Michael ESPIG , Dan BAUM , Robert VALENTINE , Bret TOLL , Elmoustapha OULD-AHMED-VALL
Abstract: Disclosed embodiments relate to systems and methods for performing duplicate detection instructions on two-dimensional (2D) data. In one example, a processor includes fetch circuitry to fetch an instruction, decode circuitry to decode the fetched instruction having fields to specify an opcode and locations of a source matrix comprising M×N elements and a destination, the opcode to indicate execution circuitry is to use a plurality of comparators to discover duplicates in the source matrix, and store indications of locations of discovered duplicates in the destination. The execution circuitry to execute the decoded instruction as per the opcode.
-
-
-
-
-
-
-
-