-
公开(公告)号:US12124847B2
公开(公告)日:2024-10-22
申请号:US16474475
申请日:2017-07-01
Applicant: Intel Corporation
Inventor: Robert Valentine , Dan Baum , Zeev Sperber , Jesus Corbal , Elmoustapha Ould-Ahmed-Vall , Bret L Toll , Mark J. Charney , Barukh Ziv , Alexander Heinecke , Milind Girkar , Menachem Adelman , Simon Rubanovich
CPC classification number: G06F9/30036 , G06F7/485 , G06F7/4876 , G06F7/762 , G06F9/3001 , G06F9/30032 , G06F9/30043 , G06F9/30109 , G06F9/30112 , G06F9/30134 , G06F9/30145 , G06F9/30149 , G06F9/3016 , G06F9/30185 , G06F9/30196 , G06F9/3818 , G06F9/3836 , G06F17/16 , G06F2212/454
Abstract: Embodiments detailed herein relate to matrix operations. In particular, support for a matrix transpose instruction is detailed. In some embodiments, decode circuitry to decode an instruction having fields for an opcode, a source matrix operand identifier, and a destination matrix operand identifier; and execution circuitry to execute the decoded instruction to transpose each row of elements of the identified source matrix operand into a corresponding column of the identified destination matrix operand are detailed.
-
12.
公开(公告)号:US20240126546A1
公开(公告)日:2024-04-18
申请号:US18399473
申请日:2023-12-28
Applicant: Intel Corporation
Inventor: Roman S. Dubtsov , Robert Valentine , Jesus Corbal , Milind Girkar , Elmoustapha Ould-Ahmed-Vall
IPC: G06F9/30
CPC classification number: G06F9/30036 , G06F9/3001
Abstract: Disclosed embodiments relate to executing a vector-complex fused multiply-add instruction. In one example, a method includes fetching an instruction, a format of the instruction including an opcode, a first source operand identifier, a second source operand identifier, and a destination operand identifier, wherein each of the identifiers identifies a location storing a packed data comprising at least one complex number, decoding the instruction, retrieving data associated with the first and second source operand identifiers, and executing the decoded instruction to, for each packed data element position of the identified first and second source operands, cross-multiply the real and imaginary components to generate four products: a product of real components, a product of imaginary components, and two mixed products, generate a complex result by using the four products according to the instruction, and store a result to the corresponding position of the identified destination operand.
-
公开(公告)号:US20240111826A1
公开(公告)日:2024-04-04
申请号:US17937252
申请日:2022-09-30
Applicant: Intel Corporation
Inventor: Jiasheng Chen , Kevin Hurd , Changwon Rhee , Jorge Parra , Fangwen Fu , Theo Drane , William Zorn , Peter Caday , Gregory Henry , Guei-Yuan Lueh , Farzad Chehrazi , Amit Karande , Turbo Majumder , Xinmin Tian , Milind Girkar , Hong Jiang
CPC classification number: G06F17/16 , G06F7/5443 , G06T1/20
Abstract: An apparatus to facilitate hardware enhancements for double precision systolic support is disclosed. The apparatus includes matrix acceleration hardware having double-precision (DP) matrix multiplication circuitry including a multiplier circuits to multiply pairs of input source operands in a DP floating-point format; adders to receive multiplier outputs from the multiplier circuits and accumulate the multiplier outputs in a high precision intermediate format; an accumulator circuit to accumulate adder outputs from the adders with at least one of a third global source operand on a first pass of the DP matrix multiplication circuitry or an intermediate result from the first pass on a second pass of the DP matrix multiplication circuitry, wherein the accumulator circuit to generate an accumulator output in the high precision intermediate format; and a down conversion and rounding circuit to down convert and round an output of the second pass as final result in the DP floating-point format.
-
公开(公告)号:US11579871B2
公开(公告)日:2023-02-14
申请号:US17346891
申请日:2021-06-14
Applicant: Intel Corporation
Inventor: Venkateswara R. Madduri , Elmoustapha Ould-Ahmed-Vall , Robert Valentine , Jesus Corbal , Mark J. Charney , Carl Murray , Milind Girkar , Bret Toll
Abstract: Embodiments of systems, apparatuses, and methods for performing vector-packed controllable sine and/or cosine operations in a processor are described. For example, execution circuitry executes a decoded instruction to compute at least a real output value and an imaginary output value based on at least a cosine calculation and a sine calculation, the cosine and sine calculations each based on an index value from a packed data source operand, add the index value with an index increment value from the packed data source operand to create an updated index value, and store the real output value, the imaginary output value, and the updated index value to a packed data destination operand.
-
公开(公告)号:US20210357217A1
公开(公告)日:2021-11-18
申请号:US17335942
申请日:2021-06-01
Applicant: Intel Corporation
Inventor: Roman S. Dubtsov , Robert Valentine , Jesus Corbal , Milind Girkar , Elmoustapha Ould-Ahmed-Vall
IPC: G06F9/30
Abstract: Disclosed embodiments relate to executing a vector-complex fused multiply-add Instruction. In one example, a method includes fetching an instruction, a format of the instruction including an opcode, a first source operand identifier, a second source operand identifier, and a destination operand identifier, wherein each of the identifiers identifies a location storing a packed data comprising at least one complex number, decoding the instruction, retrieving data associated with the first and second source operand identifiers, and executing the decoded instruction to, for each packed data element position of the identified first and second source operands, cross-multiply the real and imaginary components to generate four products: a product of real components, a product of imaginary components, and two mixed products, generate a complex result by using the four products according to the instruction, and store a result to the corresponding position of the identified destination operand.
-
公开(公告)号:US09910796B2
公开(公告)日:2018-03-06
申请号:US13844343
申请日:2013-03-15
Applicant: Intel Corporation
Inventor: Hong Wang , Per Hammarlund , Xiang Zou , John P. Shen , Xinmin Tian , Milind Girkar , Perry H. Wang , Piyush N. Desai
CPC classification number: G06F13/24 , G06F9/3005 , G06F9/3009 , G06F9/30145 , G06F9/3851 , G06F9/4843 , G06F11/3024 , G06F11/348 , G06F12/0875 , G06F2201/86 , G06F2201/88 , G06F2201/885 , G06F2212/452
Abstract: Method, apparatus, and program means for a programmable event driven yield mechanism that may activate other threads. In one embodiment, an apparatus includes execution resources to execute a plurality of instructions and a monitor to detect a condition indicating a low level of progress. The monitor can disrupt processing of a program by transferring to a handler in response to detecting the condition indicating a low level of progress. In another embodiment, thread switch logic may be coupled to a plurality of event monitors which monitor events within the multithreading execution logic. The thread switch logic switches threads based at least partially on a programmable condition of one or more of the performance monitors.
-
-
-
-
-