Patent search ap:("Intel Corporation") AND inv:"Jesus Corbal" Page 2

11.

发明授权
Instructions for vector multiplication of unsigned words with rounding 有权

公开(公告)号：US11704124B2

公开(公告)日：2023-07-18

申请号：US17573556

申请日：2022-01-11

Applicant: Intel Corporation

Inventor： Venkateswara R. Madduri , Carl Murray , Elmoustapha Ould-Ahmed-Vall , Mark J. Charney , Robert Valentine , Jesus Corbal

IPC: G06F9/22 , G06F9/30 , G06F9/38

CPC classification number: G06F9/3001 , G06F9/30036 , G06F9/30145 , G06F9/3802

Abstract: Disclosed embodiments relate to executing a vector multiplication instruction. In one example, a processor includes fetch circuitry to fetch the vector multiplication instruction having fields for an opcode, first and second source identifiers, and a destination identifier, decode circuitry to decode the fetched instruction, execution circuitry to, on each of a plurality of corresponding pairs of fixed-sized elements of the identified first and second sources, execute the decoded instruction to generate a double-sized product of each pair of fixed-sized elements, the double-sized product being represented by at least twice a number of bits of the fixed size, and generate an unsigned fixed-sized result by rounding the most significant fixed-sized portion of the double-sized product to fit into the identified destination.

12.

发明授权
Systems, methods, and apparatuses for dot product operations 有权

公开(公告)号：US11669326B2

公开(公告)日：2023-06-06

申请号：US15859271

申请日：2017-12-29

Applicant: Intel Corporation

Inventor： Raanan Sade , Simon Rubanovich , Amit Gradstein , Zeev Sperber , Alexander Heinecke , Robert Valentine , Mark J. Charney , Bret Toll , Jesus Corbal , Elmoustapha Ould-Ahmed-Vall , Menachem Adelman

IPC: G06F9/30 , G06F17/16

CPC classification number: G06F9/30014 , G06F9/30109 , G06F9/30145 , G06F17/16

Abstract: Embodiments detailed herein relate to matrix operations. For example, embodiments of instruction support for matrix (tile) dot product operations are detailed. Exemplary instructions including computing a dot product of signed words and accumulating in a quadword data elements of a matrix pair. Additionally, in some instances, non-accumulating quadword data elements of the matrix pair are set to zero.

13.

发明授权
Systems and methods to load a tile register pair 有权

公开(公告)号：US11609762B2

公开(公告)日：2023-03-21

申请号：US17398927

申请日：2021-08-10

Applicant: Intel Corporation

Inventor： Raanan Sade , Simon Rubanovich , Amit Gradstein , Zeev Sperber , Alexander Heinecke , Robert Valentine , Mark J. Charney , Bret Toll , Jesus Corbal , Elmoustapha Ould-Ahmed-Vall , Menachem Adelman

IPC: G06F9/30 , G06F9/38

Abstract: Embodiments detailed herein relate to systems and methods to load a tile register pair. In one example, a processor includes: decode circuitry to decode a load matrix pair instruction having fields for an opcode and source and destination identifiers to identify source and destination matrices, respectively, each matrix having a PAIR parameter equal to TRUE; and execution circuitry to execute the decoded load matrix pair instruction to load every element of left and right tiles of the identified destination matrix from corresponding element positions of left and right tiles of the identified source matrix, respectively, wherein the executing operates on one row of the identified destination matrix at a time, starting with the first row.

14.

发明授权
Systems, apparatuses, and methods for controllable sine and/or cosine operations 有权

公开(公告)号：US11579871B2

公开(公告)日：2023-02-14

申请号：US17346891

申请日：2021-06-14

Applicant: Intel Corporation

Inventor： Venkateswara R. Madduri , Elmoustapha Ould-Ahmed-Vall , Robert Valentine , Jesus Corbal , Mark J. Charney , Carl Murray , Milind Girkar , Bret Toll

IPC: G06F9/30 , G06F7/548

Abstract: Embodiments of systems, apparatuses, and methods for performing vector-packed controllable sine and/or cosine operations in a processor are described. For example, execution circuitry executes a decoded instruction to compute at least a real output value and an imaginary output value based on at least a cosine calculation and a sine calculation, the cosine and sine calculations each based on an index value from a packed data source operand, add the index value with an index increment value from the packed data source operand to create an updated index value, and store the real output value, the imaginary output value, and the updated index value to a packed data destination operand.

15.

发明授权
Apparatus and method of improved insert instructions 有权

公开(公告)号：US11347502B2

公开(公告)日：2022-05-31

申请号：US15476356

申请日：2017-03-31

Applicant: Intel Corporation

Inventor： Elmoustapha Ould-Ahmed-Vall , Robert Valentine , Jesus Corbal , Bret L. Toll , Mark J. Charney , Zeev Sperber , Amit Gradstein

IPC: G06F9/30 , G06F12/06 , G06F9/38

Abstract: An apparatus is described having instruction execution logic circuitry to execute first, second, third and fourth instruction. Both the first instruction and the second instruction insert a first group of input vector elements to one of multiple first non overlapping sections of respective first and second resultant vectors. The first group has a first bit width. Each of the multiple first non overlapping sections have a same bit width as the first group. Both the third instruction and the fourth instruction insert a second group of input vector elements to one of multiple second non overlapping sections of respective third and fourth resultant vectors. The second group has a second bit width that is larger than said first bit width. Each of the multiple second non overlapping sections have a same bit width as the second group. The apparatus also includes masking layer circuitry to mask the first and third instructions at a first resultant vector granularity, and, mask the second and fourth instructions at a second resultant vector granularity.

16.

发明申请
SYSTEMS AND METHODS TO LOAD A TILE REGISTER PAIR 有权

公开(公告)号：US20220091848A1

公开(公告)日：2022-03-24

申请号：US17398927

申请日：2021-08-10

Applicant: Intel Corporation

Inventor： Raanan Sade , Simon Rubanovich , Amit Gradstein , Zeev Sperber , Alexander Heinecke , Robert Valentine , Mark J. Charney , Bret Toll , Jesus Corbal , Elmoustapha Ould-Ahmed-Vall , Menachem Adelman

IPC: G06F9/30

Abstract: Embodiments detailed herein relate to systems and methods to load a tile register pair. In one example, a processor includes: decode circuitry to decode a load matrix pair instruction having fields for an opcode and source and destination identifiers to identify source and destination matrices, respectively, each matrix having a PAIR parameter equal to TRUE; and execution circuitry to execute the decoded load matrix pair instruction to load every element of left and right tiles of the identified destination matrix from corresponding element positions of left and right tiles of the identified source matrix, respectively, wherein the executing operates on one row of the identified destination matrix at a time, starting with the first row.

17.

发明授权
Systems, methods, and apparatuses for tile broadcast 有权

公开(公告)号：US11263008B2

公开(公告)日：2022-03-01

申请号：US16487774

申请日：2017-07-01

Applicant: Intel Corporation

Inventor： Robert Valentine , Zeev Sperber , Mark J. Charney , Bret L. Toll , Jesus Corbal , Alexander Heinecke , Barukh Ziv , Dan Baum , Elmoustapha Ould-Ahmed-Vall , Stanislav Shwartsman

IPC: G06F9/30 , G06F7/485 , G06F7/487 , G06F17/16 , G06F7/76 , G06F9/38

Abstract: Embodiments detailed herein relate to matrix operations. In particular, embodiment of broadcasting elements are described. For example, some embodiments describe broadcasting a scalar to all configured data element positons of a destination matrix (tile). For example, some embodiments describe broadcasting a row to all configured data element positons of a destination matrix (tile). For example, some embodiments describe broadcasting a column to all configured data element positons of a destination matrix (tile).

18.

发明授权
Instruction execution that broadcasts and masks data values at different levels of granularity 有权

公开(公告)号：US11250154B2

公开(公告)日：2022-02-15

申请号：US16730686

申请日：2019-12-30

Applicant: Intel Corporation

Inventor： Elmoustapha Ould-Ahmed-Vall , Robert Valentine , Jesus Corbal , Bret L. Toll , Mark J. Charney

IPC: G06F21/62 , G06F16/27 , G06F21/70 , G06F9/30 , G06F9/38

Abstract: An apparatus is described that includes an execution unit to execute a first instruction and a second instruction. The execution unit includes input register space to store a first data structure to be replicated when executing the first instruction and to store a second data structure to be replicated when executing the second instruction. The first and second data structures are both packed data structures. Data values of the first packed data structure are twice as large as data values of the second packed data structure. The execution unit also includes replication logic circuitry to replicate the first data structure when executing the first instruction to create a first replication data structure, and, to replicate the second data structure when executing the second data instruction to create a second replication data structure. The execution unit also includes masking logic circuitry to mask the first replication data structure at a first granularity and mask the second replication data structure at a second granularity. The second granularity is twice as fine as the first granularity.

19.

发明授权
Vector instructions for selecting and extending an unsigned sum of products of words and doublewords for accumulation 有权

公开(公告)号：US11249755B2

公开(公告)日：2022-02-15

申请号：US16642786

申请日：2017-09-27

Applicant: Intel Corporation

Inventor： Venkateswara R. Madduri , Carl Murray , Elmoustapha Ould-Ahmed-Vall , Mark J. Charney , Robert Valentine , Jesus Corbal

IPC: G06F9/302 , G06F9/30

Abstract: Disclosed embodiments relate to executing a vector unsigned multiplication and accumulation instruction. In one example, a processor includes fetch circuitry to fetch a vector unsigned multiplication and accumulation instruction having fields for an opcode, first and second source identifiers, a destination identifier, and an immediate, wherein the identified sources and destination are same-sized registers, decode circuitry to decode the fetched instruction, and execution circuitry to execute the decoded instruction, on each corresponding pair of first and second quadwords of the identified first and second sources, to: generate a sum of products of two doublewords of the first quadword and either two lower words or two upper words of the second quadword, based on the immediate, zero-extend the sum to a quadword-sized sum, and accumulate the quadword-sized sum with a previous value of a destination quadword in a same relative register position as the first and second quadwords.

20.

发明授权
Apparatus and method for scaling pre-scaled results of complex multiply-accumulate operations on packed real and imaginary data elements 有权

公开(公告)号：US11243765B2

公开(公告)日：2022-02-08

申请号：US15721145

申请日：2017-09-29

Applicant: Intel Corporation

Inventor： Venkateswara Madduri , Elmoustapha Ould-Ahmed-Vall , Mark Charney , Robert Valentine , Jesus Corbal , Binwei Yang

IPC: G06F9/302 , G06F9/30 , G06F7/544 , G06F17/14 , G06F7/48

Abstract: Apparatus and method to transform complex data including a processor that comprises: multiplier circuitry to multiply packed complex N-bit data elements with packed complex M-bit data elements to generate at least four real products; adder circuitry to subtract a first real product from a second real product to generate a first temporary result, subtract a third real product from a fourth real product to generate a second temporary result, add the first temporary result to a first packed N-bit data element to generate a first pre-scaled result, subtract the first temporary result from the first packed N-bit data element to generate a second pre-scaled result, add the second temporary result to a second packed N-bit data element to generate a third pre-scaled result, and subtract the second temporary result from the second packed N-bit data element to generate a fourth pre-scaled result; and scaling circuitry to scale the pre-scaled results.

Search Results

Country/Region

Patent validity

Application date

Publication (announcement) day

applicant

The country/region where the applicant is located

Inventor

IPC

IPC Department

IPC class

IPC subclass

IPC group

IPC team

Appearance classification