Patent search ap:("INTEL CORPORATION") AND inv:"Milind Girkar" Page 2

11.

发明授权
Systems, methods, and apparatuses for tile transpose 有权

公开(公告)号：US12124847B2

公开(公告)日：2024-10-22

申请号：US16474475

申请日：2017-07-01

Applicant: Intel Corporation

Inventor： Robert Valentine , Dan Baum , Zeev Sperber , Jesus Corbal , Elmoustapha Ould-Ahmed-Vall , Bret L Toll , Mark J. Charney , Barukh Ziv , Alexander Heinecke , Milind Girkar , Menachem Adelman , Simon Rubanovich

IPC: G06F9/30 , G06F7/485 , G06F7/487 , G06F7/76 , G06F9/38 , G06F17/16

CPC classification number: G06F9/30036 , G06F7/485 , G06F7/4876 , G06F7/762 , G06F9/3001 , G06F9/30032 , G06F9/30043 , G06F9/30109 , G06F9/30112 , G06F9/30134 , G06F9/30145 , G06F9/30149 , G06F9/3016 , G06F9/30185 , G06F9/30196 , G06F9/3818 , G06F9/3836 , G06F17/16 , G06F2212/454

Abstract: Embodiments detailed herein relate to matrix operations. In particular, support for a matrix transpose instruction is detailed. In some embodiments, decode circuitry to decode an instruction having fields for an opcode, a source matrix operand identifier, and a destination matrix operand identifier; and execution circuitry to execute the decoded instruction to transpose each row of elements of the identified source matrix operand into a corresponding column of the identified destination matrix operand are detailed.

12.

发明公开
SYSTEMS AND METHODS FOR EXECUTING A FUSED MULTIPLY-ADD INSTRUCTION FOR COMPLEX NUMBERS 审中-公开

公开(公告)号：US20240126546A1

公开(公告)日：2024-04-18

申请号：US18399473

申请日：2023-12-28

Applicant: Intel Corporation

Inventor： Roman S. Dubtsov , Robert Valentine , Jesus Corbal , Milind Girkar , Elmoustapha Ould-Ahmed-Vall

IPC: G06F9/30

CPC classification number: G06F9/30036 , G06F9/3001

Abstract: Disclosed embodiments relate to executing a vector-complex fused multiply-add instruction. In one example, a method includes fetching an instruction, a format of the instruction including an opcode, a first source operand identifier, a second source operand identifier, and a destination operand identifier, wherein each of the identifiers identifies a location storing a packed data comprising at least one complex number, decoding the instruction, retrieving data associated with the first and second source operand identifiers, and executing the decoded instruction to, for each packed data element position of the identified first and second source operands, cross-multiply the real and imaginary components to generate four products: a product of real components, a product of imaginary components, and two mixed products, generate a complex result by using the four products according to the instruction, and store a result to the corresponding position of the identified destination operand.

13.

发明公开
HARDWARE ENHANCEMENTS FOR DOUBLE PRECISION SYSTOLIC SUPPORT 审中-公开

公开(公告)号：US20240111826A1

公开(公告)日：2024-04-04

申请号：US17937252

申请日：2022-09-30

Applicant: Intel Corporation

Inventor： Jiasheng Chen , Kevin Hurd , Changwon Rhee , Jorge Parra , Fangwen Fu , Theo Drane , William Zorn , Peter Caday , Gregory Henry , Guei-Yuan Lueh , Farzad Chehrazi , Amit Karande , Turbo Majumder , Xinmin Tian , Milind Girkar , Hong Jiang

IPC: G06F17/16 , G06F7/544 , G06T1/20

CPC classification number: G06F17/16 , G06F7/5443 , G06T1/20

Abstract: An apparatus to facilitate hardware enhancements for double precision systolic support is disclosed. The apparatus includes matrix acceleration hardware having double-precision (DP) matrix multiplication circuitry including a multiplier circuits to multiply pairs of input source operands in a DP floating-point format; adders to receive multiplier outputs from the multiplier circuits and accumulate the multiplier outputs in a high precision intermediate format; an accumulator circuit to accumulate adder outputs from the adders with at least one of a third global source operand on a first pass of the DP matrix multiplication circuitry or an intermediate result from the first pass on a second pass of the DP matrix multiplication circuitry, wherein the accumulator circuit to generate an accumulator output in the high precision intermediate format; and a down conversion and rounding circuit to down convert and round an output of the second pass as final result in the DP floating-point format.

14.

发明授权
Systems, apparatuses, and methods for controllable sine and/or cosine operations 有权

公开(公告)号：US11579871B2

公开(公告)日：2023-02-14

申请号：US17346891

申请日：2021-06-14

Applicant: Intel Corporation

Inventor： Venkateswara R. Madduri , Elmoustapha Ould-Ahmed-Vall , Robert Valentine , Jesus Corbal , Mark J. Charney , Carl Murray , Milind Girkar , Bret Toll

IPC: G06F9/30 , G06F7/548

Abstract: Embodiments of systems, apparatuses, and methods for performing vector-packed controllable sine and/or cosine operations in a processor are described. For example, execution circuitry executes a decoded instruction to compute at least a real output value and an imaginary output value based on at least a cosine calculation and a sine calculation, the cosine and sine calculations each based on an index value from a packed data source operand, add the index value with an index increment value from the packed data source operand to create an updated index value, and store the real output value, the imaginary output value, and the updated index value to a packed data destination operand.

15.

发明申请
SYSTEMS AND METHODS FOR EXECUTING A FUSED MULTIPLY-ADD INSTRUCTION FOR COMPLEX NUMBERS 有权

公开(公告)号：US20210357217A1

公开(公告)日：2021-11-18

申请号：US17335942

申请日：2021-06-01

Applicant: Intel Corporation

Inventor： Roman S. Dubtsov , Robert Valentine , Jesus Corbal , Milind Girkar , Elmoustapha Ould-Ahmed-Vall

IPC: G06F9/30

Abstract: Disclosed embodiments relate to executing a vector-complex fused multiply-add Instruction. In one example, a method includes fetching an instruction, a format of the instruction including an opcode, a first source operand identifier, a second source operand identifier, and a destination operand identifier, wherein each of the identifiers identifies a location storing a packed data comprising at least one complex number, decoding the instruction, retrieving data associated with the first and second source operand identifiers, and executing the decoded instruction to, for each packed data element position of the identified first and second source operands, cross-multiply the real and imaginary components to generate four products: a product of real components, a product of imaginary components, and two mixed products, generate a complex result by using the four products according to the instruction, and store a result to the corresponding position of the identified destination operand.

16.

发明授权
Programmable event driven yield mechanism which may activate other threads 有权

公开(公告)号：US09910796B2

公开(公告)日：2018-03-06

申请号：US13844343

申请日：2013-03-15

Applicant: Intel Corporation

Inventor： Hong Wang , Per Hammarlund , Xiang Zou , John P. Shen , Xinmin Tian , Milind Girkar , Perry H. Wang , Piyush N. Desai

IPC: G06F9/38 , G06F13/24 , G06F9/30 , G06F9/48 , G06F11/34

CPC classification number: G06F13/24 , G06F9/3005 , G06F9/3009 , G06F9/30145 , G06F9/3851 , G06F9/4843 , G06F11/3024 , G06F11/348 , G06F12/0875 , G06F2201/86 , G06F2201/88 , G06F2201/885 , G06F2212/452

Abstract: Method, apparatus, and program means for a programmable event driven yield mechanism that may activate other threads. In one embodiment, an apparatus includes execution resources to execute a plurality of instructions and a monitor to detect a condition indicating a low level of progress. The monitor can disrupt processing of a program by transferring to a handler in response to detecting the condition indicating a low level of progress. In another embodiment, thread switch logic may be coupled to a plurality of event monitors which monitor events within the multithreading execution logic. The thread switch logic switches threads based at least partially on a programmable condition of one or more of the performance monitors.

Search Results

Country/Region

Patent validity

Application date

Publication (announcement) day

applicant

The country/region where the applicant is located

Inventor

IPC

IPC Department

IPC class

IPC subclass

IPC group

IPC team

Appearance classification