-
公开(公告)号:US12153899B2
公开(公告)日:2024-11-26
申请号:US17133363
申请日:2020-12-23
Applicant: Intel Corporation
Inventor: Menachem Adelman , Robert Valentine , Daniel Towner , Amit Gradstein , Mark Jay Charney
Abstract: An apparatus and method for complex matrix transpose and multiply. For example, one embodiment of a processor comprises: a decoder to decode a first complex matrix multiplication and transpose instruction including a first source operand to identify a first plurality of real and imaginary values in a first complex source matrix, a second source operand to identify a second plurality of real and imaginary values in a second complex source matrix, and a first destination operand to identify a result matrix with real and imaginary values; execution circuitry to execute the first complex matrix transpose and multiplication instruction, the execution circuitry comprising transpose hardware logic to transpose at least one of the source matrices, parallel multiplication circuitry to multiply real values from the first plurality of real and imaginary values with corresponding real values from the second plurality of real and imaginary values to generate a first plurality of real products, to multiply imaginary values from the first plurality of real and imaginary values with corresponding imaginary values from the second plurality of real and imaginary values to generate a second plurality of real products; and addition/subtraction circuitry to subtract each real product in the second plurality of real products from a corresponding real product in the first plurality of real products to produce a corresponding real value in the result matrix.
-
公开(公告)号:US20230205606A1
公开(公告)日:2023-06-29
申请号:US17922277
申请日:2021-03-26
Applicant: Intel Corporation
Inventor: Stephen Palermo , Neelam Chandwani , Kshitij Doshi , Chetan Hiremath , Rajesh Gadiyar , Udayan Mukherjee , Daniel Towner , Valerie Parker , Shubha Bommalingaiahnapallya , Rany ElSayed
IPC: G06F9/50
CPC classification number: G06F9/5094 , G06F9/505 , G06F9/5044
Abstract: Systems, apparatus, and methods to workload optimize hardware are disclosed herein. An example apparatus includes power control circuitry to determine an application ratio based on an instruction to be executed by one or more cores of a processor to execute a workload, and configure, before the execution of the workload, at least one of (i) the one or more cores of the processor based on the application ratio or (ii) uncore logic of the processor based on the application ratio, and execution circuitry to execute the workload with the at least one of the one or more cores or the uncore logic.
-
公开(公告)号:US12216734B2
公开(公告)日:2025-02-04
申请号:US17133456
申请日:2020-12-23
Applicant: Intel Corporation
Inventor: Menachem Adelman , Robert Valentine , Daniel Towner , Amit Gradstein , Mark Jay Charney
Abstract: An apparatus and method for complex matrix conjugation and multiplication. For example, one embodiment of a processor comprises: a decoder to decode a complex matrix conjugation and multiplication instruction including a first source operand to identify a first complex source matrix comprising a first plurality of complex values, a second source operand to identify a second complex source matrix comprising a second plurality of complex values, and a first destination operand to identify a result matrix; execution circuitry to execute the complex matrix conjugation and multiplication instruction, the execution circuitry comprising: matrix conjugation hardware logic to determine a plurality of complex conjugate values corresponding to the first plurality of complex values; transpose hardware logic to transpose the plurality of complex conjugate values to generate a conjugate transpose matrix comprising the complex conjugate values; parallel multiplication circuitry to: multiply real values from the plurality of complex conjugate values of the conjugate transpose matrix with corresponding imaginary values from the second plurality of complex values to generate a first plurality of imaginary products, and multiply imaginary values from the plurality of complex conjugate values of the conjugate transpose matrix with corresponding real values from the second plurality of complex values to generate a second plurality of imaginary products; and addition/subtraction circuitry to add each imaginary product in the first plurality of imaginary products to a corresponding imaginary product in the second plurality of imaginary products to produce a corresponding imaginary component in the result matrix.
-
公开(公告)号:US11907713B2
公开(公告)日:2024-02-20
申请号:US16729369
申请日:2019-12-28
Applicant: Intel Corporation
Inventor: Kermin E. Chofleming , Chuanjun Zhang , Daniel Towner , Simon C. Steely, Jr. , Benjamin Keen
CPC classification number: G06F9/3001 , G06F9/30181 , G06F15/80
Abstract: Systems, methods, and apparatuses relating to a sign modification field for fused operations in a configurable spatial accelerator are described. In one embodiment, a hardware accelerator includes a plurality of processing elements; a network between the plurality of processing elements to transfer values between the plurality of processing elements; and a processing element of the plurality of processing elements comprising: a first plurality of input queues having a multiple bit width coupled to the network, at least one first output queue having the multiple bit width coupled to the network, operation circuitry coupled to the first plurality of input queues having the multiple bit width, a sign modification circuit coupled to the first plurality of input queues having the multiple bit width, and a configuration register within the processing element to store a configuration value comprising a sign modification field that causes the sign modification circuit to modify a sign bit of a value from the first plurality of input queues according to the sign modification field to create a sign modified value, and the configuration value causes the operation circuitry to perform a selected operation of a plurality of operations on a value from the first plurality of input queues and the sign modified value to create a resultant value, and store the resultant value in the at least one first output queue.
-
公开(公告)号:US12174911B2
公开(公告)日:2024-12-24
申请号:US17133473
申请日:2020-12-23
Applicant: Intel Corporation
Inventor: Menachem Adelman , Robert Valentine , Daniel Towner , Amit Gradstein , Mark Jay Charney
IPC: G06F17/16
Abstract: An apparatus and method for complex matrix multiplication. For example, one embodiment of a processor comprises: a decoder to decode a first complex matrix multiplication instruction; execution circuitry to execute the first complex matrix multiplication instruction, the execution circuitry comprising parallel multiplication circuitry to multiply real values from the first plurality of real and imaginary values with corresponding real values from the second plurality of real and imaginary values to generate a first plurality of real products, to multiply imaginary values from the first plurality of real and imaginary values with corresponding imaginary values from the second plurality of real and imaginary values to generate a second plurality of real products; and addition/subtraction circuitry to subtract each real product in the second plurality of real products from a corresponding real product in the first plurality of real products to produce a corresponding real value in the result matrix. The decoder may also decode and the execution circuitry may execute a second complex matrix multiplication instruction to multiply real and imaginary values from the first plurality with corresponding imaginary and real values, respectively, from the second plurality to generate first and second pluralities of imaginary products, and to add corresponding imaginary products to produce a corresponding imaginary value in the result matrix.
-
公开(公告)号:US12086595B2
公开(公告)日:2024-09-10
申请号:US17214853
申请日:2021-03-27
Applicant: Intel Corporation
Inventor: Menachem Adelman , Robert Valentine , Amit Gradstein , Daniel Towner , Mark Charney
IPC: G06F9/30
CPC classification number: G06F9/3016 , G06F9/30025 , G06F9/30098
Abstract: Systems, methods, and apparatuses relating to interleaving data values. An embodiment includes decoding circuitry to decode a single instruction, the instruction having one or more fields to specify an opcode, one or more fields to specify a location of a first source operand, one or more fields to specify a location of a second source operand, one or more fields to specify a location of a destination operand, and one or more fields to specify an index value to be used to index a row in the first source operand, wherein the opcode is to indicate execution circuitry is to downconvert data elements of the indexed row of the first source operand, interleave the downconverted elements with data elements of the second source operand, and store the interleaved elements in the destination operand; and execution circuitry to execute the decoded instruction according to the opcode.
-
-
-
-
-