Patent search ap:("Intel Corporation") AND inv:"Mark Charney" Page 2

11.

发明授权
Vector friendly instruction format and execution thereof 有权

公开(公告)号：US12086594B2

公开(公告)日：2024-09-10

申请号：US18239106

申请日：2023-08-28

Applicant: Intel Corporation

Inventor： Robert C. Valentine , Jesus Corbal San Adrian , Roger Espasa Sans , Robert D. Cavin , Bret L. Toll , Santiago Galan Duran , Jeffrey G. Wiedemeier , Sridhar Samudrala , Milind Baburao Girkar , Edward Thomas Grochowski , Jonathan Cannon Hall , Dennis R. Bradford , Elmoustapha Ould-Ahmed-Vall , James C Abel , Mark Charney , Seth Abraham , Suleyman Sair , Andrew Thomas Forsyth , Lisa Wu , Charles Yount

IPC: G06F9/30 , G06F9/34 , H01L29/66 , H01L29/775 , H01L29/78 , H01L29/786

CPC classification number: G06F9/30145 , G06F9/3001 , G06F9/30014 , G06F9/30025 , G06F9/30032 , G06F9/30036 , G06F9/30047 , G06F9/30149 , G06F9/30181 , G06F9/30185 , G06F9/30192 , G06F9/34 , H01L29/66553 , H01L29/775 , H01L29/7831 , H01L29/78696 , G06F9/30018 , H01L29/66

Abstract: A vector friendly instruction format and execution thereof. According to one embodiment of the invention, a processor is configured to execute an instruction set. The instruction set includes a vector friendly instruction format. The vector friendly instruction format has a plurality of fields including a base operation field, a modifier field, an augmentation operation field, and a data element width field, wherein the first instruction format supports different versions of base operations and different augmentation operations through placement of different values in the base operation field, the modifier field, the alpha field, the beta field, and the data element width field, and wherein only one of the different values may be placed in each of the base operation field, the modifier field, the alpha field, the beta field, and the data element width field on each occurrence of an instruction in the first instruction format in instruction streams.

12.

发明授权
Apparatus and method for complex by complex conjugate multiplication 有权

公开(公告)号：US11755323B2

公开(公告)日：2023-09-12

申请号：US17672504

申请日：2022-02-15

Applicant: Intel Corporation

Inventor： Venkateswara Madduri , Elmoustapha Ould-Ahmed-Vall , Jesus Corbal , Mark Charney , Robert Valentine , Binwei Yang

IPC: G06F9/30

CPC classification number: G06F9/30036 , G06F9/3001 , G06F9/30105

Abstract: An apparatus and method for multiplying packed real and imaginary components of complex numbers are described. A processor embodiment includes: a decoder to decode a first instruction to generate a decoded instruction; a first source register to store a first plurality of packed real and imaginary data elements; a second source register to store a second plurality of packed real and imaginary data elements; and execution circuitry to execute the decoded instruction. The execution circuitry includes: multiplier circuitry to select real and imaginary data elements in the first source register and second source, multiply each selected imaginary data element in the first source register with a selected real data element in the second source register, and multiply each selected real data element in the first source register with a selected imaginary data element in the second source register to generate a plurality of imaginary products; adder circuitry to add a first subset of the plurality of imaginary products and subtract a second subset of the plurality of imaginary products to generate a first temporary result, and to add a third subset of the plurality of imaginary products and subtract a fourth subset of the plurality of imaginary products to generate a second temporary result; and accumulation circuitry to combine the first temporary result with first data from a destination register to generate a first final result, combine the second temporary result with second data from the destination register to generate a second final result, and store the first final result and second final result back in the destination register.

13.

发明申请
INSTRUCTION AND LOGIC FOR SUM OF ABSOLUTE DIFFERENCES 有权

公开(公告)号：US20220308881A1

公开(公告)日：2022-09-29

申请号：US17214291

申请日：2021-03-26

Applicant: Intel Corporation

Inventor： Deepti Aggarwal , Michael Espig , Robert Valentine , Mark Charney

IPC: G06F9/38 , G06F9/30

Abstract: In an embodiment, a processor includes: a fetch circuit to fetch instructions, the instructions including a sum of absolute differences (SAD) instruction; a decode circuit to decode the SAD instruction; and an execution circuit to, during an execution of the decoded SAD instruction, generate an SAD output vector based on a plurality of input vectors, the SAD output vector including a plurality of absolute differences values. Other embodiments are described and claimed.

14.

发明授权
Apparatus and method for vector multiply and accumulate of packed words 有权

公开(公告)号：US11409525B2

公开(公告)日：2022-08-09

申请号：US15879420

申请日：2018-01-24

Applicant: Intel Corporation

Inventor： Alexander Heinecke , Dipankar Das , Robert Valentine , Mark Charney

IPC: G06F9/38 , G06F9/30

Abstract: An apparatus and method for performing multiply-accumulate operations. For example, one embodiment of a processor comprises: a decoder to decode instructions; a first source register to store a first plurality of packed words; a second source register to store a second plurality of packed words; a third source register to store a plurality of packed quadwords; execution circuitry to execute a first instruction, the execution circuitry comprising: extension circuitry to sign-extend or zero-extend the first and second plurality of packed words to generate a first and second plurality of doublewords corresponding to the first and second plurality of packed words; multiplier circuitry to multiply each of the first plurality of doublewords with a corresponding one of the second plurality of doublewords to generate a plurality of temporary products; adder circuitry to add at least a first set of the temporary products to generate a first temporary sum; accumulation circuitry to combine the first temporary sum with a first packed quadword value from a first quadword location in the third source register to generate a first accumulated quadword result; a destination register to store the first accumulated quadword result in the first quadword location.

15.

发明申请
INSTRUCTIONS TO CONVERT FROM FP16 TO BF8 有权

公开(公告)号：US20220206743A1

公开(公告)日：2022-06-30

申请号：US17134358

申请日：2020-12-26

Applicant: Intel Corporation

Inventor： Alexander Heinecke , Naveen Mellempudi , Robert Valentine , Mark Charney , Christopher Hughes , Evangelos Georganas , Zeev Sperber , Amit Gradstein , Simon Rubanovich

IPC: G06F5/00

Abstract: Techniques for converting FP16 to BF8 using bias are described. An exemplary embodiment utilizes decoder circuitry to decode a single instruction, the single instruction to include one or more fields to identify a first source operand, one or more fields to identify a second source operand, one or more fields to identify a source/destination operand, and one or more fields for an opcode, wherein the opcode is to indicate that execution circuitry is to convert packed half-precision data from the identified first and second sources to packed bfloat8 data using bias terms from the identified source/destination operand and store the packed bfloat8 data into corresponding data element positions of the identified source/destination operand; and execution circuitry to execute the decoded instruction according to the opcode to convert packed half-precision data from the identified first and second sources to packed bfloat8 data using bias terms from the identified source/destination operand and store the packed bfloat8 data into corresponding data element positions of the identified source/destination operand.

16.

发明授权
Apparatus and method for scaling pre-scaled results of complex multiply-accumulate operations on packed real and imaginary data elements 有权

公开(公告)号：US11243765B2

公开(公告)日：2022-02-08

申请号：US15721145

申请日：2017-09-29

Applicant: Intel Corporation

Inventor： Venkateswara Madduri , Elmoustapha Ould-Ahmed-Vall , Mark Charney , Robert Valentine , Jesus Corbal , Binwei Yang

IPC: G06F9/302 , G06F9/30 , G06F7/544 , G06F17/14 , G06F7/48

Abstract: Apparatus and method to transform complex data including a processor that comprises: multiplier circuitry to multiply packed complex N-bit data elements with packed complex M-bit data elements to generate at least four real products; adder circuitry to subtract a first real product from a second real product to generate a first temporary result, subtract a third real product from a fourth real product to generate a second temporary result, add the first temporary result to a first packed N-bit data element to generate a first pre-scaled result, subtract the first temporary result from the first packed N-bit data element to generate a second pre-scaled result, add the second temporary result to a second packed N-bit data element to generate a third pre-scaled result, and subtract the second temporary result from the second packed N-bit data element to generate a fourth pre-scaled result; and scaling circuitry to scale the pre-scaled results.

17.

发明授权
Vector friendly instruction format and execution thereof 有权

公开(公告)号：US11210096B2

公开(公告)日：2021-12-28

申请号：US17004711

申请日：2020-08-27

Applicant: Intel Corporation

Inventor： Robert C. Valentine , Jesus Corbal San Adrian , Roger Espasa Sans , Robert D. Cavin , Bret L. Toll , Santiago Galan Duran , Jeffrey G. Wiedemeier , Sridhar Samudrala , Milind Baburao Girkar , Edward Thomas Grochowski , Jonathan Cannon Hall , Dennis R. Bradford , Elmoustapha Ould-Ahmed-Vall , James C Abel , Mark Charney , Seth Abraham , Suleyman Sair , Andrew Thomas Forsyth , Lisa Wu , Charles Yount

IPC: G06F9/30 , G06F9/34

Abstract: A vector friendly instruction format and execution thereof. According to one embodiment of the invention, a processor is configured to execute an instruction set. The instruction set includes a vector friendly instruction format. The vector friendly instruction format has a plurality of fields including a base operation field, a modifier field, an augmentation operation field, and a data element width field, wherein the first instruction format supports different versions of base operations and different augmentation operations through placement of different values in the base operation field, the modifier field, the alpha field, the beta field, and the data element width field, and wherein only one of the different values may be placed in each of the base operation field, the modifier field, the alpha field, the beta field, and the data element width field on each occurrence of an instruction in the first instruction format in instruction streams.

18.

发明授权
Floating point to fixed point conversion 有权

公开(公告)号：US10763891B2

公开(公告)日：2020-09-01

申请号：US16291231

申请日：2019-03-04

Applicant: Intel Corporation

Inventor： Venkateswara Madduri , Elmoustapha Ould-Ahmed-Vall , Robert Valentine , Jesus Corbal , Mark Charney

IPC: H03M7/24 , H03M7/40 , H03M7/42

Abstract: Embodiments of an instruction, its operation, and executional support for the instruction are described. In some embodiments, a processor comprises decode circuitry to decode an instruction having fields for an opcode, a packed data source operand identifier, and a packed data destination operand identifier; and execution circuitry to execute the decoded instruction to convert a single precision floating point data element of a least significant packed data element position of the identified packed data source operand to a fixed-point representation, store the fixed-point representation as 32-bit integer and a 32-bit integer exponent in the two least significant packed data element positions of the identified packed data destination operand, and zero of all remaining packed data elements of the identified packed data destination operand.

19.

发明授权
Apparatus and method for multiplying, summing, and accumulating sets of packed bytes 有权

公开(公告)号：US10705839B2

公开(公告)日：2020-07-07

申请号：US15850499

申请日：2017-12-21

Applicant: Intel Corporation

Inventor： Venkateswara Madduri , Elmoustapha Ould-Ahmed-Vall , Robert Valentine , Mark Charney , Jesus Corbal

IPC: G06F9/30

Abstract: A processor having a decoder to decode an instruction to generate a decoded instruction; a first source register to store a first plurality of packed signed bytes; a second source register to store a second plurality of packed signed bytes; execution circuitry to execute the decoded instruction, the execution circuitry including: multiplier circuitry to multiply each packed signed byte from the first source register with a corresponding packed signed byte from the second source register to generate temporary products, adder circuitry to add a plurality of sets of the temporary products to generate a plurality of temporary sums; negation and extension circuitry to negate and extend each of the temporary sums to doublewords sums; and accumulation circuitry to add each of the doublewords sums to a doubleword from a third source register to generate final doubleword results; and a packed data destination register to store the final doubleword results.

20.

发明公开
INSTRUCTIONS TO CONVERT FROM FP16 TO BF8 审中-公开

公开(公告)号：US20240248720A1

公开(公告)日：2024-07-25

申请号：US18627907

申请日：2024-04-05

Applicant: Intel Corporation

Inventor： Alexander Heinecke , Naveen Mellempudi , Robert Valentine , Mark Charney , Christopher Hughes , Evangelos Georganas , Zeev Sperber , Amit Gradstein , Simon Rubanovich

IPC: G06F9/30 , G06F7/499 , H03M7/24

CPC classification number: G06F9/30145 , G06F7/49947 , G06F9/30025 , G06F9/30036 , H03M7/24

Abstract: Techniques for converting FP16 data elements to BF8 data elements using a single instruction are described. An exemplary apparatus includes decoder circuitry to decode a single instruction, the single instruction to include a one or more fields to identify a source operand, one or more fields to identify a destination operand, and one or more fields for an opcode, the opcode to indicate that execution circuitry is to convert packed half-precision floating-point data from the identified source to packed bfloat8 data and store the packed bfloat8 data into corresponding data element positions of the identified destination operand; and execution circuitry to execute the decoded instruction according to the opcode to convert packed half-precision floating-point data from the identified source to packed bfloat8 data and store the packed bfloat8 data into corresponding data element positions.

Search Results

Country/Region

Patent validity

Application date

Publication (announcement) day

applicant

The country/region where the applicant is located

Inventor

IPC

IPC Department

IPC class

IPC subclass

IPC group

IPC team

Appearance classification