-
公开(公告)号:US20210096822A1
公开(公告)日:2021-04-01
申请号:US17121155
申请日:2020-12-14
Applicant: INTEL CORPORATION
Inventor: Raanan SADE , Robert VALENTINE , Mark J. CHARNEY , Simon RUBANOVICH , Amit GRADSTEIN , Zeev SPERBER , Bret TOLL , Jesus CORBAL , Christopher J. HUGHES , Alexander F. HEINECKE , Elmoustapha OULD-AHMED-VALL
IPC: G06F7/78 , G06F9/30 , G06F15/173 , G06F9/38
Abstract: Disclosed embodiments relate to systems and methods for performing instructions to transpose rectangular tiles. In one example, a processor includes fetch circuitry to fetch an instruction having fields to specify an opcode and locations of first destination, second destination, first source, and second source matrices, the specified opcode to cause the processor to process each of the specified source and destination matrices as a rectangular matrix, decode circuitry to decode the fetched rectangular matrix transpose instruction, and execution circuitry to respond to the decoded rectangular matrix transpose instruction by transposing each row of elements of the specified first source matrix into a corresponding column of the specified first destination matrix and transposing each row of elements of the specified second source matrix into a corresponding column of the specified second destination matrix.
-
公开(公告)号:US20210081200A1
公开(公告)日:2021-03-18
申请号:US16642766
申请日:2017-09-27
Applicant: Intel Corporation
Inventor: Venkateswara R. MADDURI , Carl MURRAY , Elmoustapha OULD-AHMED-VALL , Mark J. CHARNEY , Robert VALENTINE , Jesus CORBAL
Abstract: Disclosed embodiments relate to executing a vector multiplication instruction. In one example, a processor includes fetch circuitry to fetch the vector multiplication instruction having fields for an opcode, first and second source identifiers, and a destination identifier, decode circuitry to decode the fetched instruction, execution circuitry to, on each of a plurality of corresponding pairs of fixed-sized elements of the identified first and second sources, execute the decoded instruction to generate a double-sized product of each pair of fixed-sized elements, the double-sized product being represented by at least twice a number of bits of the fixed size, and generate a signed fixed-sized result by rounding the most significant fixed-sized portion of the double-sized product to fit into the identified destination.
-
公开(公告)号:US20210072985A1
公开(公告)日:2021-03-11
申请号:US16642778
申请日:2017-09-27
Applicant: Intel Corporation
Inventor: Venkateswara R. MADDURI , Carl MURRAY , Elmoustapha OULD-AHMED-VALL , Mark J. CHARNEY , Robert VALENTINE , Jesus CORBAL
Abstract: Disclosed embodiments relate to executing a vector multiplication instruction. In one example, a processor includes fetch circuitry to fetch the vector multiplication instruction having fields for an opcode, first and second source identifiers, and a destination identifier, decode circuitry to decode the fetched instruction, execution circuitry to, on each of a plurality of corresponding pairs of fixed-sized elements of the identified first and second sources, execute the decoded instruction to generate a double-sized product of each pair of fixed-sized elements, the double-sized product being represented by at least twice a number of bits of the fixed size, and generate an unsigned fixed-sized result by rounding the most significant fixed-sized portion of the double-sized product to fit into the identified destination.
-
公开(公告)号:US20200233665A1
公开(公告)日:2020-07-23
申请号:US16487747
申请日:2017-07-01
Applicant: Intel Corporation
Inventor: Robert VALENTINE , Zeev SPERBER , Mark J. CHARNEY , Bret L. TOLL , Jesus CORBAL , Dan BAUM , Alexander HEINECKE , Elmoustapha OULD-AHMED-VALL
Abstract: Detailed herein are embodiment systems, processors, and methods for matrix move. For example, a processor comprising decode circuitry to decode an instruction having fields for an opcode, a source matrix operand identifier, and a destination matrix operand identifier; and execution circuitry to execute the decoded instruction to move each data element of the identified source matrix operand to corresponding data element position of the identified destination matrix operand is described.
-
公开(公告)号:US20200210188A1
公开(公告)日:2020-07-02
申请号:US16233546
申请日:2018-12-27
Applicant: Intel Corporation
Inventor: Elmoustapha OULD-AHMED-VALL , Jonathan D. PEARCE , Dan BAUM , Guei-Yuan LUEH , Michael ESPIG , Christopher J. HUGHES , Raanan SADE , Robert VALENTINE , Mark J. CHARNEY , Alexander F. HEINECKE
Abstract: Disclosed embodiments relate to systems and methods for performing matrix row-wise and column-wise permute instructions. In one example, a processor includes fetch circuitry to fetch an instruction, decoding, using decode circuitry, the fetched instruction having fields to specify an opcode and locations of a source matrix and a destination matrix, the opcode indicating the processor is to perform a permutation by copying, into each of a plurality of equal-sized logical partitions of the destination matrix, a selected logical partition of a same size from the source matrix, the selection being indicated by a permute control, and execution circuitry to execute the decoded instruction as per the opcode.
-
公开(公告)号:US20200210173A1
公开(公告)日:2020-07-02
申请号:US16232599
申请日:2018-12-26
Applicant: Intel Corporation
Inventor: Elmoustapha OULD-AHMED-VALL , Jonathan D. PEARCE , Dan BAUM , Guei-Yuan LUEH , Michael ESPIG , Christopher J. HUGHES , Raanan SADE , Robert VALENTINE , Mark J. CHARNEY , Alexander F. HEINECKE
IPC: G06F9/30
Abstract: Disclosed embodiments relate to systems and methods for performing nibble-sized operations on matrix elements. In one example, a processor includes fetch circuitry to fetch an instruction, decode circuitry to decode the fetched instruction the fetched instruction having fields to specify an opcode and locations of first source, second source, and destination matrices, the opcode to indicate the processor is to, for each pair of corresponding elements of the first and second source matrices, logically partition each element into nibble-sized partitions, perform an operation indicated by the instruction on each partition, and store execution results to a corresponding nibble-sized partition of a corresponding element of the destination matrix. The exemplary processor includes execution circuitry to execute the decoded instruction as per the opcode.
-
公开(公告)号:US20200065352A1
公开(公告)日:2020-02-27
申请号:US16487421
申请日:2017-07-01
Applicant: Intel Corporation
Inventor: Robert VALENTINE , Mark J. CHARNEY , Elmoustapha OULD-AHMED-VALL , Dan BAUM , Zeev SPERBER , Jesus CORBAL , Bret L. TOLL , Raanan SADE , Igor YANOVER , Yuri GEBIL , Rinat RAPPOPORT , Stanislav SHWARTSMAN , Menachem ADELMAN , Simon RUBANOVICH
Abstract: Embodiments detailed herein relate to matrix (tile) operations. For example, decode circuitry to decode an instruction having fields for an opcode and a memory address; and execution circuitry to execute the decoded instruction to set a tile configuration for the processor to utilize tiles in matrix operations based on a description retrieved from the memory address, wherein a tile a set of 2-dimensional registers are discussed.
-
58.
公开(公告)号:US20190258481A1
公开(公告)日:2019-08-22
申请号:US16398200
申请日:2019-04-29
Applicant: Intel Corporation
Inventor: Edward T. GROCHOWSKI , Asit K. MISHRA , Robert VALENTINE , Mark J. CHARNEY , Simon C. STEELY, JR.
Abstract: A processor of an aspect includes a decode unit to decode a matrix multiplication instruction. The matrix multiplication instruction is to indicate a first memory location of a first source matrix, is to indicate a second memory location of a second source matrix, and is to indicate a third memory location where a result matrix is to be stored. The processor also includes an execution unit coupled with the decode unit. The execution unit, in response to the matrix multiplication instruction, is to multiply a portion of the first and second source matrices prior to an interruption, and store a completion progress indicator in response to the interruption. The completion progress indicator to indicate an amount of progress in multiplying the first and second source matrices, and storing corresponding result data to the third memory location, that is to have been completed prior to the interruption.
-
公开(公告)号:US20190163476A1
公开(公告)日:2019-05-30
申请号:US15824902
申请日:2017-11-28
Applicant: Intel Corporation
Inventor: Robert VALENTINE , Mark J. CHARNEY , Raanan SADE , Elmoustapha OULD-AHMED-VALL , Jesus CORBAL
IPC: G06F9/30
Abstract: Implementations detailed herein included, but are not limited to, an apparatus having instruction execution circuitry to execute a decoded instruction having at least one operand utilizing half-precision floating point data and a register to store control information about the at least one operand utilizing half-precision floating point data, wherein the control information is to dictate when underflowing operations of execution of the instruction are to be flushed to zero and when denormal inputs of the instruction are to be zeroed.
-
公开(公告)号:US20190102181A1
公开(公告)日:2019-04-04
申请号:US15721361
申请日:2017-09-29
Applicant: Intel Corporation
Inventor: Venkateswara MADDURI , Elmoustapha OULD-AHMED-VALL , Robert VALENTINE , Mark CHARNEY
IPC: G06F9/30
Abstract: An apparatus and method for performing left-shifting operations on packed quadword data. For example, one embodiment of a processor comprises: a decoder to decode a left-shift instruction to generate a decoded left-shift instruction; a first source register to store a plurality of packed quadwords data elements; execution circuitry to execute the decoded left-shift instruction, the execution circuitry comprising shift circuitry to left-shift at least first and second packed quadword data elements from first and second packed quadword data element locations, respectively, in the first source register by an amount specified in an immediate value or in a control value in a second source register, to generate first and second left-shifted quadwords; the execution circuitry to cause selection of a specified set of most significant bits of the first and second left-shifted quadwords to be written to least significant bit regions of first and second quadword data element locations, respectively, of a destination register; and the destination register to store the specified set of the most significant bits of the first and second left-shifted quadwords.
-
-
-
-
-
-
-
-
-