-
公开(公告)号:US11221849B2
公开(公告)日:2022-01-11
申请号:US16642778
申请日:2017-09-27
Applicant: Intel Corporation
Inventor: Venkateswara R. Madduri , Carl Murray , Elmoustapha Ould-Ahmed-Vall , Mark J. Charney , Robert Valentine , Jesus Corbal
Abstract: Disclosed embodiments relate to executing a vector multiplication instruction. In one example, a processor includes fetch circuitry to fetch the vector multiplication instruction having fields for an opcode, first and second source identifiers, and a destination identifier, decode circuitry to decode the fetched instruction, execution circuitry to, on each of a plurality of corresponding pairs of fixed-sized elements of the identified first and second sources, execute the decoded instruction to generate a double-sized product of each pair of fixed-sized elements, the double-sized product being represented by at least twice a number of bits of the fixed size, and generate an unsigned fixed-sized result by rounding the most significant fixed-sized portion of the double-sized product to fit into the identified destination.
-
公开(公告)号:US11704124B2
公开(公告)日:2023-07-18
申请号:US17573556
申请日:2022-01-11
Applicant: Intel Corporation
Inventor: Venkateswara R. Madduri , Carl Murray , Elmoustapha Ould-Ahmed-Vall , Mark J. Charney , Robert Valentine , Jesus Corbal
CPC classification number: G06F9/3001 , G06F9/30036 , G06F9/30145 , G06F9/3802
Abstract: Disclosed embodiments relate to executing a vector multiplication instruction. In one example, a processor includes fetch circuitry to fetch the vector multiplication instruction having fields for an opcode, first and second source identifiers, and a destination identifier, decode circuitry to decode the fetched instruction, execution circuitry to, on each of a plurality of corresponding pairs of fixed-sized elements of the identified first and second sources, execute the decoded instruction to generate a double-sized product of each pair of fixed-sized elements, the double-sized product being represented by at least twice a number of bits of the fixed size, and generate an unsigned fixed-sized result by rounding the most significant fixed-sized portion of the double-sized product to fit into the identified destination.
-
公开(公告)号:US11579871B2
公开(公告)日:2023-02-14
申请号:US17346891
申请日:2021-06-14
Applicant: Intel Corporation
Inventor: Venkateswara R. Madduri , Elmoustapha Ould-Ahmed-Vall , Robert Valentine , Jesus Corbal , Mark J. Charney , Carl Murray , Milind Girkar , Bret Toll
Abstract: Embodiments of systems, apparatuses, and methods for performing vector-packed controllable sine and/or cosine operations in a processor are described. For example, execution circuitry executes a decoded instruction to compute at least a real output value and an imaginary output value based on at least a cosine calculation and a sine calculation, the cosine and sine calculations each based on an index value from a packed data source operand, add the index value with an index increment value from the packed data source operand to create an updated index value, and store the real output value, the imaginary output value, and the updated index value to a packed data destination operand.
-
公开(公告)号:US11249755B2
公开(公告)日:2022-02-15
申请号:US16642786
申请日:2017-09-27
Applicant: Intel Corporation
Inventor: Venkateswara R. Madduri , Carl Murray , Elmoustapha Ould-Ahmed-Vall , Mark J. Charney , Robert Valentine , Jesus Corbal
Abstract: Disclosed embodiments relate to executing a vector unsigned multiplication and accumulation instruction. In one example, a processor includes fetch circuitry to fetch a vector unsigned multiplication and accumulation instruction having fields for an opcode, first and second source identifiers, a destination identifier, and an immediate, wherein the identified sources and destination are same-sized registers, decode circuitry to decode the fetched instruction, and execution circuitry to execute the decoded instruction, on each corresponding pair of first and second quadwords of the identified first and second sources, to: generate a sum of products of two doublewords of the first quadword and either two lower words or two upper words of the second quadword, based on the immediate, zero-extend the sum to a quadword-sized sum, and accumulate the quadword-sized sum with a previous value of a destination quadword in a same relative register position as the first and second quadwords.
-
公开(公告)号:US11036499B2
公开(公告)日:2021-06-15
申请号:US16613537
申请日:2017-06-30
Applicant: Intel Corporation
Inventor: Venkateswara R. Madduri , Elmoustapha Ould-Ahmed-Vall , Robert Valentine , Jesus Corbal , Mark J. Charney , Carl Murray , Milind Girkar , Bret Toll
Abstract: Embodiments of systems, apparatuses, and methods for performing controllable sine and/or cosine operations in a processor are described. For example, execution circuitry executes a decoded instruction to compute at least a real output value and an imaginary output value based on at least a cosine calculation and a sine calculation, the cosine and sine calculations each based on an index value from a packed data source operand, add the index value with an index increment value from the packed data source operand to create an updated index value, and store the real output value, the imaginary output value, and the updated index value to a packed data destination operand.
-
6.
公开(公告)号:US20190102186A1
公开(公告)日:2019-04-04
申请号:US15721627
申请日:2017-09-29
Applicant: Intel Corporation
Inventor: Venkateswara R. Madduri , Elmoustapha Ould-Ahmed-Vall , Robert Valentine , Jesus Corbal , Mark Charney , Carl Murray
Abstract: Embodiments of systems, apparatuses, and methods for multiplication and accumulation of data values in a processor are described. For example, execution circuitry executes a decoded instruction to multiply selected unsigned data values from a plurality of packed data element positions in first and second packed data source operands to generate a plurality of first unsigned result values, sum the plurality of first unsigned result values to generate one or more second unsigned result values, accumulate the one or more second unsigned result values with one or more data values from the destination operand to generate one or more third unsigned result values, and store the one or more third unsigned result values in one or more packed data element positions in a destination operand.
-
-
-
-
-