-
公开(公告)号:US20190196827A1
公开(公告)日:2019-06-27
申请号:US15850180
申请日:2017-12-21
Applicant: Intel Corporation
Inventor: Venkateswara Madduri , Elmoustapha Ould-Ahmed-Vall , Robert Valentine , Mark Charney , Jesus Corbal
Abstract: An apparatus and method for performing signed multiplication of packed signed doublewords and accumulation with a signed quadword. For example, one embodiment of a processor comprises: a first source register to store a first plurality of packed signed doubleword data elements; a second source register to store a second plurality of packed signed doubleword data elements; a third source register to store a plurality of packed signed quadword data elements; execution circuitry to execute the decoded instruction, the execution circuitry comprising: multiplier circuitry to multiply first and second packed signed doubleword data elements from the first source register with third and fourth packed signed doubleword data elements from the second source register, respectively, to generate first and second temporary signed quadword products, the multiplier circuitry to select the first, second, third, and fourth signed doubleword data elements based on the opcode of the instruction; accumulation circuitry to combine the first temporary signed quadword product with a first packed signed quadword value read from the third source register to generate a first accumulated signed quadword result and to combine the second temporary signed quadword product with a second packed signed quadword value read from the third source register to generate a second accumulated signed quadword result; a destination register or the third source register to store the first accumulated signed quadword result in a first signed quadword data element position and to store the second accumulated signed quadword result in a second signed quadword data element position.
-
公开(公告)号:US20190196818A1
公开(公告)日:2019-06-27
申请号:US16291245
申请日:2019-03-04
Applicant: Intel Corporation
Inventor: Venkateswara Madduri , Elmoustapha Ould-Ahmed-Vall , Robert Valentine , Jesus Corbal , Mark Charney
CPC classification number: G06F9/30025 , G06F7/483 , H03M7/24
Abstract: Embodiments of instructions and methods of execution of said instructions and resources to execute said instructions are detailed. For example, in an embodiment, a processor comprising: decode circuitry to decode an instruction having fields for an opcode, a packed data source operand identifier, and a packed data destination operand identifier; and execution circuitry to execute the decoded instruction to convert a data element from a least significant packed data element position of the identified packed data source operand from a fixed-point representation to a floating point representation, store the floating point representation into a 32-bit least significant packed data element position of the identified packed data destination operand, and zero all remaining packed data elements of the identified packed data destination operand is described.
-
33.
公开(公告)号:US20190163474A1
公开(公告)日:2019-05-30
申请号:US15824339
申请日:2017-11-28
Applicant: Intel Corporation
Inventor: Robert Valentine , Mark Charney , Raanan Sade , Elmoustapha Ould-Ahmed-Vall , Jesus Corbal
IPC: G06F9/30
Abstract: An embodiment of the invention is a processor including execution circuitry to, in response to a decoded instruction, convert a half-precision floating-point value to a single-precision floating-point value and store the single-precision floating-point value in each of the plurality of element locations of a destination register. The processor also includes a decoder and the destination register. The decoder is to decode an instruction to generate the decoded instruction.
-
34.
公开(公告)号:US20190102198A1
公开(公告)日:2019-04-04
申请号:US15721616
申请日:2017-09-29
Applicant: Intel Corporation
Inventor: Venkateswara R. Madduri , Elmoustapha Ould-Ahmed-Vall , Robert Valentine , Jesus Corbal , Mark Charney
IPC: G06F9/30
Abstract: Embodiments of systems, apparatuses, and methods for multiplication and accumulation of signed data values in a processor are described. For example, execution circuitry executes a decoded instruction to multiply selected signed data values from a plurality of packed data element positions in first and second packed data source operands to generate a plurality of first signed result values, sum the plurality of first signed result values to generate one or more second signed result values, accumulate the one or more signed result values with one or more data values from a destination operand to generate one or more third signed result values, and store the one or more third signed result values in one or more packed data element positions in the destination operand.
-
35.
公开(公告)号:US20190102185A1
公开(公告)日:2019-04-04
申请号:US15721599
申请日:2017-09-29
Applicant: Intel Corporation
Inventor: Venkateswara R. Madduri , Elmoustapha Ould-Ahmed-Vall , Robert Valentine , Jesus Corbal , Mark Charney
Abstract: Embodiments of systems, apparatuses, and methods for multiplication, negation, and accumulation of data values in a processor are described. For example, execution circuitry executes a decoded instruction to multiply selected data values from a plurality of packed data element positions in first and second packed data source operands to generate a plurality of first result values, sum the plurality of first result values to generate one or more second result values, negate the one or more second result values to generate one or more third result values, accumulate the one or more third result values with one or more data values from the destination operand to generate one or more fourth result values, and store the one or more third result values in one or more packed data element positions in the destination operand.
-
公开(公告)号:US12229554B2
公开(公告)日:2025-02-18
申请号:US17463405
申请日:2021-08-31
Applicant: Intel Corporation
Inventor: Alexander Heinecke , Menachem Adelman , Robert Valentine , Zeev Sperber , Amit Gradstein , Mark Charney , Evangelos Georganas , Dhiraj Kalamkar , Christopher Hughes , Cristina Anderson
Abstract: Techniques for performing BF16 FMA in response to an instruction are described. In some examples, an instruction has fields for an opcode, an identification of location of a packed data source/destination operand (a first source), an identification of a location of a second packed data source operand, an identification of a location of a third packed data source operand, and an identification of location of a packed data source/destination operand, wherein the opcode is to indicate operand ordering and that execution circuitry is to, per data element position, perform a BF16 value fused multiply-accumulate operation using the first, second, and third source operands and store a result in a corresponding data element position of the source/destination operand.
-
公开(公告)号:US12204903B2
公开(公告)日:2025-01-21
申请号:US17359522
申请日:2021-06-26
Applicant: Intel Corporation
Inventor: Venkateswara Madduri , Cristina Anderson , Robert Valentine , Mark Charney , Vedvyas Shanbhogue
IPC: G06F9/30
Abstract: Techniques for matrix multiplication are described. In some examples, a single instruction having a format of fields for an opcode, one or more fields to indicate a location of a source/destination operand, one or more fields to indicate a location of a first source operand, and one or more fields to indicate a location of a second source operand is used. Wherein the opcode is to indicate that execution circuitry is to: multiply values from corresponding data elements of the first and second sources, add a first subset of the multiplied values to a first value from the source/destination operand and store in a first data element position of the source/destination operand, and add a second subset of the multiplied values to a second value from the source/destination operand and store in a second data element position of the source/destination operand.
-
公开(公告)号:US20240045684A1
公开(公告)日:2024-02-08
申请号:US17958380
申请日:2022-10-01
Applicant: Intel Corporation
Inventor: Alexander Heinecke , Menachem Adelman , Mark Charney , Evangelos Georganas , Amit Gradstein , Christopher Hughes , Naveen Mellempudi , Simon Rubanovich , Uri Sherman , Zeev Sperber , Robert Valentine
IPC: G06F9/30
CPC classification number: G06F9/30145 , G06F9/30036 , G06F9/30018
Abstract: Techniques for converting FP16 to BF8 using bias are described. An example embodiment utilizes decoder circuitry to decode a single instruction, the single instruction to include one or more fields to identify a first source operand, one or more fields to identify a second source operand, one or more fields to identify a source/destination operand, and one or more fields for an opcode, wherein the opcode is to indicate that execution circuitry is to convert packed half-precision data from the identified first and second sources to packed FP8 data using bias terms from the identified source/destination operand and store the packed FP8 data into corresponding data element positions of the identified source/destination operand; and execution circuitry to execute the decoded instruction according to the opcode to convert packed half-precision data from the identified first and second sources to packed FP8 data using bias terms from the identified source/destination operand and store the packed FP8 data into corresponding data element positions of the identified source/destination operand.
-
公开(公告)号:US11768681B2
公开(公告)日:2023-09-26
申请号:US15879419
申请日:2018-01-24
Applicant: Intel Corporation
Inventor: Alexander Heinecke , Dipankar Das , Robert Valentine , Mark Charney
CPC classification number: G06F9/3001 , G06F9/3013 , G06F9/30014 , G06F9/3016 , G06F9/30018 , G06F9/30036 , G06F9/3893
Abstract: An apparatus and method for performing multiply-accumulate operations. For example, one embodiment of a processor comprises: a decoder to decode instructions; a first source register to store a first plurality of packed bytes; a second source register to store a second plurality of packed bytes; a third source register to store a plurality of packed doublewords; execution circuitry to execute a first instruction, the execution circuitry comprising: extension circuitry to sign-extend or zero-extend the first and second plurality of packed bytes to generate a first and second plurality of words corresponding to the first and second plurality of packed bytes; multiplier circuitry to multiply each of the first plurality of words with a corresponding one of the second plurality of words to generate a plurality of temporary products; adder circuitry to add at least a first set of the temporary products to generate a first temporary sum; accumulation circuitry to combine the first temporary sum with a first packed doubleword value from a first doubleword location in the third source register to generate a first accumulated doubleword result; a destination register to store the first accumulated doubleword result in the first doubleword location.
-
公开(公告)号:US11740904B2
公开(公告)日:2023-08-29
申请号:US17524624
申请日:2021-11-11
Applicant: Intel Corporation
Inventor: Robert C. Valentine , Jesus Corbal San Adrian , Roger Espasa Sans , Robert D. Cavin , Bret L. Toll , Santiago Galan Duran , Jeffrey G. Wiedemeier , Sridhar Samudrala , Milind Baburao Girkar , Edward Thomas Grochowski , Jonathan Cannon Hall , Dennis R. Bradford , Elmoustapha Ould-Ahmed-Vall , James C Abel , Mark Charney , Seth Abraham , Suleyman Sair , Andrew Thomas Forsyth , Lisa Wu , Charles Yount
IPC: G06F9/30 , G06F9/34 , H01L29/78 , H01L29/66 , H01L29/786 , H01L29/775
CPC classification number: G06F9/30145 , G06F9/3001 , G06F9/30014 , G06F9/30025 , G06F9/30032 , G06F9/30036 , G06F9/30047 , G06F9/30149 , G06F9/30181 , G06F9/30185 , G06F9/30192 , G06F9/34 , H01L29/66553 , H01L29/775 , H01L29/7831 , H01L29/78696 , G06F9/30018 , H01L29/66
Abstract: A vector friendly instruction format and execution thereof. According to one embodiment of the invention, a processor is configured to execute an instruction set. The instruction set includes a vector friendly instruction format. The vector friendly instruction format has a plurality of fields including a base operation field, a modifier field, an augmentation operation field, and a data element width field, wherein the first instruction format supports different versions of base operations and different augmentation operations through placement of different values in the base operation field, the modifier field, the alpha field, the beta field, and the data element width field, and wherein only one of the different values may be placed in each of the base operation field, the modifier field, the alpha field, the beta field, and the data element width field on each occurrence of an instruction in the first instruction format in instruction streams.