-
21.
公开(公告)号:US20170169246A1
公开(公告)日:2017-06-15
申请号:US15245113
申请日:2016-08-23
Applicant: Intel Corporation
Inventor: ELMOUSTAPHA OULD-AHMED-VALL , ROBERT VALENTINE , JESUS CORBAL , BRET L. TOLL , MARK J. CHARNEY
CPC classification number: G06F21/6227 , G06F9/30018 , G06F9/30032 , G06F9/30036 , G06F9/30101 , G06F9/3802 , G06F17/30575 , G06F21/6254 , G06F21/70
Abstract: An apparatus is described that includes an execution unit to execute a first instruction and a second instruction. The execution unit includes input register space to store a first data structure to be replicated when executing the first instruction and to store a second data structure to be replicated when executing the second instruction. The first and second data structures are both packed data structures. Data values of the first packed data structure are twice as large as data values of the second packed data structure. The execution unit also includes replication logic circuitry to replicate the first data structure when executing the first instruction to create a first replication data structure, and, to replicate the second data structure when executing the second data instruction to create a second replication data structure. The execution unit also includes masking logic circuitry to mask the first replication data structure at a first granularity and mask the second replication data structure at a second granularity. The second granularity is twice as fine as the first granularity.
-
公开(公告)号:US20220326946A1
公开(公告)日:2022-10-13
申请号:US17589428
申请日:2022-01-31
Applicant: Intel Corporation
Inventor: VENKATESWARA MADDURI , ELMOUSTAPHA OULD-AHMED-VALL , MARK CHARNEY , ROBERT VALENTINE , JESUS CORBAL , BINWEI YANG
Abstract: An apparatus and method for performing a transform on complex data. For example, one embodiment of a processor comprises: multiplier circuitry to multiply packed real N-bit data elements in the first source register with packed real M-bit data elements in the second source register and to multiply packed imaginary N-bit data elements in the first source register with packed imaginary M-bit data elements in the second source register to generate at least four real products, adder circuitry to subtract a first selected real product from a second selected real product to generate a first temporary result and to subtract a third selected real product from a fourth selected real product to generate a second temporary result, the adder circuitry to add the first temporary result to a first packed N-bit data element from the third source register to generate a first pre-scaled result, to subtract the first temporary result from the first packed N-bit data element to generate a second pre-scaled result, to add the second temporary result to a second packed N-bit data element from the third source register to generate a third pre-scaled result, and to subtract the second temporary result from the second packed N-bit data element to generate a fourth pre-scaled result; scaling circuitry to scale the first, second, third and fourth pre-scaled results to a specified bit width to generate first, second, third, and fourth final results; and a destination register to store the first, second, third, and fourth final results in specified data element positions.
-
23.
公开(公告)号:US20220215117A1
公开(公告)日:2022-07-07
申请号:US17677958
申请日:2022-02-22
Applicant: Intel Corporation
Inventor: ELMOUSTAPHA OULD-AHMED-VALL , ROBERT VALENTINE , JESUS CORBAL , BRET L. TOLL , MARK J. CHARNEY
Abstract: An apparatus is described that includes an execution unit to execute a first instruction and a second instruction. The execution unit includes input register space to store a first data structure to be replicated when executing the first instruction and to store a second data structure to be replicated when executing the second instruction. The first and second data structures are both packed data structures. Data values of the first packed data structure are twice as large as data values of the second packed data structure. The execution unit also includes replication logic circuitry to replicate the first data structure when executing the first instruction to create a first replication data structure, and, to replicate the second data structure when executing the second data instruction to create a second replication data structure. The execution unit also includes masking logic circuitry to mask the first replication data structure at a first granularity and mask the second replication data structure at a second granularity. The second granularity is twice as fine as the first granularity.
-
公开(公告)号:US20220171624A1
公开(公告)日:2022-06-02
申请号:US17672504
申请日:2022-02-15
Applicant: Intel Corporation
Inventor: VENKATESWARA MADDURI , ELMOUSTAPHA OULD-AHMED-VALL , JESUS CORBAL , MARK CHARNEY , ROBERT VALENTINE , BINWEI YANG
IPC: G06F9/30
Abstract: An apparatus and method for multiplying packed real and imaginary components of complex numbers are described. A processor embodiment includes: a decoder to decode a first instruction to generate a decoded instruction; a first source register to store a first plurality of packed real and imaginary data elements; a second source register to store a second plurality of packed real and imaginary data elements; and execution circuitry to execute the decoded instruction. The execution circuitry includes: multiplier circuitry to select real and imaginary data elements in the first source register and second source, multiply each selected imaginary data element in the first source register with a selected real data element in the second source register, and multiply each selected real data element in the first source register with a selected imaginary data element in the second source register to generate a plurality of imaginary products; adder circuitry to add a first subset of the plurality of imaginary products and subtract a second subset of the plurality of imaginary products to generate a first temporary result, and to add a third subset of the plurality of imaginary products and subtract a fourth subset of the plurality of imaginary products to generate a second temporary result; and accumulation circuitry to combine the first temporary result with first data from a destination register to generate a first final result, combine the second temporary result with second data from the destination register to generate a second final result, and store the first final result and second final result back in the destination register.
-
25.
公开(公告)号:US20210004227A1
公开(公告)日:2021-01-07
申请号:US17027230
申请日:2020-09-21
Applicant: Intel Corporation
Inventor: VENKATESWARA MADDURI , ELMOUSTAPHA OULD-AHMED-VALL , JESUS CORBAL , MARK CHARNEY , ROBERT VALENTINE , BINWEI YANG
Abstract: An apparatus and method for performing dual concurrent multiplications of packed data elements. For example one embodiment of a processor comprises: a decoder to decode a first instruction to generate a decoded instruction; a first source register to store a first plurality of packed byte data elements; a second source register to store a second plurality of packed byte data elements; execution circuitry to execute the decoded instruction, the execution circuitry comprising: multiplier circuitry to concurrently multiply each of the packed byte data elements of the first plurality with a corresponding packed byte data element of the second plurality to generate a plurality of products; adder circuitry to add specified sets of the products to generate temporary results for each set of products; zero-extension or sign-extension circuitry to zero-extend or sign-extend the temporary result for each set to generate an extended temporary result for each set; accumulation circuitry to combine each of the extended temporary results with a selected packed data value stored in a third source register to generate a plurality of final results; and a destination register to store the plurality of final results as a plurality of packed data elements in specified data element positions.
-
公开(公告)号:US20190196812A1
公开(公告)日:2019-06-27
申请号:US15850412
申请日:2017-12-21
Applicant: Intel Corporation
Inventor: ELMOUSTAPHA OULD-AHMED-VALL , ROBERT VALENTINE , MARK CHARNEY , JESUS CORBAL , VENKATESWARA MADDURI
IPC: G06F9/30
Abstract: An apparatus and method for performing signed multiplication of packed signed/unsigned doublewords and accumulation with a quadword. For example, one embodiment of a processor comprises: a first source register to store a first plurality of packed doubleword data elements; a second source register to store a second plurality of packed doubleword data elements; a third source register to store a plurality of packed quadword data elements; execution circuitry to execute the decoded instruction, the execution circuitry comprising: multiplier circuitry to multiply first and second packed doubleword data elements from the first source register with third and fourth packed doubleword data elements from the second source register, respectively, to generate first and second temporary quadword products, the multiplier circuitry to select the first, second, third, and fourth doubleword data elements based on the opcode of the instruction; accumulation circuitry to combine the first temporary quadword product with a first packed quadword value read from the third source register to generate a first accumulated quadword result and to combine the second temporary quadword product with a second packed quadword value read from the third source register to generate a second accumulated quadword result; a destination register or the third source register to store the first accumulated quadword result in a first quadword data element position and to store the second accumulated quadword result in a second quadword data element position.
-
27.
公开(公告)号:US20190102195A1
公开(公告)日:2019-04-04
申请号:US15721471
申请日:2017-09-29
Applicant: Intel Corporation
Inventor: VENKATESWARA MADDURI , ELMOUSTAPHA OULD-AHMED-VALL , JESUS CORBAL , MARK CHARNEY , ROBERT VALENTINE , BINWEI YANG
IPC: G06F9/30
Abstract: An apparatus and method for performing a transform on complex data. For example, one embodiment of a processor comprises: a decoder to decode a first instruction to generate a decoded instruction; a first source register to store a first plurality of packed real and imaginary data elements; a second source register to store a second plurality of packed real and imaginary data elements; a third source register to store a third plurality of packed real and imaginary data elements; execution circuitry to execute the decoded instruction, the execution circuitry comprising: multiplier circuitry to select real and imaginary data elements in the first and second source registers to multiply based on an immediate of the first instruction, the multiplier circuitry to multiply first packed data elements from the first source register with second packed data elements from the second source register in accordance with the immediate to generate a plurality of real and imaginary products, adder circuitry to select real and imaginary data elements in the third source register based on the immediate, the adder circuitry to add and subtract selected real and imaginary values from the real and imaginary products to generate first real and imaginary results; scaling, rounding, and/or saturation circuitry to scale, round, and/or saturate the first real and imaginary results to generate final real and imaginary data elements; and a destination register to store the final real and imaginary data elements in specified data element positions.
-
公开(公告)号:US20190102193A1
公开(公告)日:2019-04-04
申请号:US15721448
申请日:2017-09-29
Applicant: Intel Corporation
Inventor: VENKATESWARA MADDURI , ELMOUSTAPHA OULD-AHMED-VALL , JESUS CORBAL , MARK CHARNEY , ROBERT VALENTINE , BINWEI YANG
IPC: G06F9/30
Abstract: An apparatus and method for multiplying packed real and imaginary components of complex numbers. For example, one embodiment of a processor comprises: a decoder to decode a first instruction to generate a decoded instruction; a first source register to store a first plurality of packed real and imaginary data elements; a second source register to store a second plurality of packed real and imaginary data elements; and execution circuitry to execute the decoded instruction, the execution circuitry comprising: multiplier circuitry to select real and imaginary data elements in the first source register and second source register to multiply, the multiplier circuitry to multiply each selected imaginary data element in the first source register with a selected real data element in the second source register, and to multiply each selected real data element in the first source register with a selected imaginary data element in the second source register to generate a plurality of imaginary products, adder circuitry to add a first subset of the plurality of imaginary products and subtract a second subset of the plurality of imaginary products to generate a first temporary result and to add a third subset of the plurality of imaginary products and subtract a fourth subset of the plurality of imaginary products to generate a second temporary result, accumulation circuitry to combine the first temporary result with first data from a destination register to generate a first final result and to combine the second temporary result with second data from the destination register to generate a second final result and to store the first final result and second final result back in the destination register.
-
29.
公开(公告)号:US20190102183A1
公开(公告)日:2019-04-04
申请号:US15721459
申请日:2017-09-29
Applicant: Intel Corporation
Inventor: VENKATESWARA MADDURI , ELMOUSTAPHA OULD-AHMED-VALL , JESUS CORBAL , MARK CHARNEY , ROBERT VALENTINE , BINWEI YANG
IPC: G06F9/30
Abstract: An apparatus and method for multiplying packed real and imaginary components of complex numbers. For example, one embodiment of a processor comprises: a decoder to decode a first instruction to generate a decoded instruction; a first source register to store a first plurality of packed real and imaginary data elements; a second source register to store a second plurality of packed real and imaginary data elements; execution circuitry to execute the decoded instruction, the execution circuitry comprising: multiplier circuitry to select real and imaginary data elements in the first source register and second source register to multiply, the multiplier circuitry to multiply each selected imaginary data element in the first source register with a selected real data element in the second source register, and to multiply each selected real data element in the first source register with a selected imaginary data element in the second source register to generate a plurality of imaginary products, adder circuitry to add a first subset of the plurality of imaginary products to generate a first temporary result and to add a second subset of the plurality of imaginary products to generate a second temporary result; accumulation circuitry to combine the first temporary result with first data from a destination register to generate a first final result and to combine the second temporary result with second data from the destination register to generate a second final result and to store the first final result and second final result back in the destination register.
-
30.
公开(公告)号:US20190095643A1
公开(公告)日:2019-03-28
申请号:US16141283
申请日:2018-09-25
Applicant: INTEL CORPORATION
Inventor: ELMOUSTAPHA OULD-AHMED-VALL , ROBERT VALENTINE , JESUS CORBAL , BRET L. TOLL , MARK J. CHARNEY
Abstract: An apparatus is described that includes an execution unit to execute a first instruction and a second instruction. The execution unit includes input register space to store a first data structure to be replicated when executing the first instruction and to store a second data structure to be replicated when executing the second instruction. The first and second data structures are both packed data structures. Data values of the first packed data structure are twice as large as data values of the second packed data structure. The execution unit also includes replication logic circuitry to replicate the first data structure when executing the first instruction to create a first replication data structure, and, to replicate the second data structure when executing the second data instruction to create a second replication data structure. The execution unit also includes masking logic circuitry to mask the first replication data structure at a first granularity and mask the second replication data structure at a second granularity. The second granularity is twice as fine as the first granularity.
-
-
-
-
-
-
-
-
-