-
公开(公告)号:EP4418136A2
公开(公告)日:2024-08-21
申请号:EP24187271.2
申请日:2016-10-20
申请人: INTEL Corporation
发明人: Valentine, Robert , Ryvchin, Galina , Majcher, Piotr , Charney, Mark J. , Ould-Ahmed-Vall, ElMoustapha , Corbal, Jesus , Girkar, Milind B. , Sperber, Zeev , Rubanovich, Simon , Gradstein, Amit
IPC分类号: G06F15/76
CPC分类号: G06F15/76 , G06F9/30036 , G06F9/30014 , G06F9/30038 , G06F9/30018
摘要: In some embodiments, an apparatus comprises: circuitry to fetch one or more instructions, the one or more instructions to indicate a first source vector comprising a first plurality of integer data elements, a second source vector comprising a second plurality of integer data elements, and one or more accumulation integer data elements, wherein each of the one or more accumulation integer data elements is four times larger than each data element of the first plurality of integer data elements and the second plurality of integer data elements, and wherein the first plurality of integer data elements and the one or more accumulation integer data elements are signed integer data elements and the second plurality of integer data elements are unsigned integer data elements; on-chip storage to store the first plurality of integer data elements, the second plurality of integer data elements, and the one or more accumulation integer data elements; and execution circuitry to execute the one or more instructions to generate one or more result integer data elements. To generate the one or more result integer data elements, the execution circuitry is to: multiply each data element of the first plurality of integer data elements with a corresponding data element of the second plurality of integer data elements to generate a plurality of products, and accumulate the plurality of products in groups of four, each group of four products to be accumulated with a corresponding accumulation integer data element of the one or more accumulation integer data elements with saturation to generate a corresponding one or more result integer data elements.
-
公开(公告)号:EP3971710A1
公开(公告)日:2022-03-23
申请号:EP21207389.4
申请日:2016-10-20
申请人: INTEL Corporation
发明人: Valentine, Robert , Ryvchin, Galina , Majcher, Piotr , Charney, Mark J. , Ould-Ahmed-Vall, ElMoustapha , Corbal, Jesus , Girkar, Milind B. , Sperber, Zeev , Rubanovich, Simon , Gradstein, Amit
摘要: In some embodiments, a single instruction is provided that has an opcode, a first field to represent a packed data source/destination operand, a second field to represent a first packed data source operand, and a third field to represent a second packed data source operand. Packed data elements of the first and second packed data source operands are of a first size and packed data elements of the packed data source/destination operand are of a second size greater than the first size. In response to the single instruction, execution circuitry of an apparatus, according to the opcode of the single instruction, for each packed data element position of the packed data source/destination operand is configured to: sign extend a plurality of packed data bytes from a corresponding packed data element position of the first packed data source operand; zero extend a plurality of packed data bytes from a corresponding packed data element position of the second packed data source operand; multiply each of the sign extended plurality of packed data bytes from the first packed data source operand with a corresponding one of the zero extended plurality of packed data bytes from the second packed data source operand to result in a plurality of results; add the plurality of results with a packed data element of the second size of a corresponding packed data element position of the packed data source/destination operand to result in an addition result; and store the addition result in the corresponding packed data element position of the packed data source/destination operand.
-
公开(公告)号:EP4198718A1
公开(公告)日:2023-06-21
申请号:EP23156307.3
申请日:2016-10-20
申请人: INTEL Corporation
发明人: Valentine, Robert , Ryvchin, Galina , Majcher, Piotr , Charney, Mark J. , Ould-Ahmed-Vall, ElMoustapha , Corbal, Jesus , Girkar, Milind B. , Sperber, Zeev , Rubanovich, Simon , Gradstein, Amit
摘要: In some embodiments, an apparatus comprises: decode circuitry to decode a single instruction, the single instruction having fields to indicate an opcode, a packed destination operand, a first packed source operand, and a second packed source operand, wherein elements of the destination are 32 bits in size and elements of the first source and the second source are 16 bits in size; a register file having a plurality of packed data registers including registers for the destination and source operands; and execution circuitry, coupled to the decode circuitry. The execution circuitry is to perform operations corresponding to the instruction, including to, for each element position of the destination: multiply a first element from the first source and a first element from the second source to generate a first result, multiply a second element from the first source and a second element from the second source to generate a second result, add the first result and the second result to generate a third result, add the third result to an element from the element position of the destination to generate a fourth result, and store the fourth result in the element position of the destination.
-
公开(公告)号:EP4148563A1
公开(公告)日:2023-03-15
申请号:EP22203441.5
申请日:2016-10-20
申请人: INTEL Corporation
发明人: Valentine, Robert , Ryvchin, Galina , Majcher, Piotr , Charney, Mark J. , Ould-Ahmed-Vall, ElMoustapha , Corbal, Jesus , Girkar, Milind B. , Sperber, Zeev , Rubanovich, Simon , Gradstein, Amit
摘要: In some embodiments, an apparatus with execution circuitry is provided. The execution circuitry is to execute a single instruction to, for each result packed data element: preserve an existing value of the result packed data element or set the result packed data element to zero if a corresponding bit value in a writemask register is set to a first value; and if the corresponding bit value in the writemask register is set to a second value, then: multiply a first number of a first source packed data elements with corresponding packed data elements of a second source packed data elements to produce a first number of products, add the first number of products to a corresponding packed data element from a third source packed data elements to produce the result packed data element of a second size in a corresponding position in a source/destination packed data register.
-
公开(公告)号:EP3971711A1
公开(公告)日:2022-03-23
申请号:EP21207395.1
申请日:2016-10-20
申请人: INTEL Corporation
发明人: Valentine, Robert , Ryvchin, Galina , Majcher, Piotr , Charney, Mark J. , Ould-Ahmed-Vall, ElMoustapha , Corbal, Jesus , Girkar, Milind B. , Sperber, Zeev , Rubanovich, Simon , Gradstein, Amit
摘要: In some embodiments, a single instruction is provided that has an opcode, a first field to represent a packed data source/destination operand, a second field to represent a first packed data source operand, and a third field to represent a second packed data source operand. Packed data elements of the first and second packed data source operands are of a first size and packed data elements of the packed data source/destination operand are of a second size greater than the first size. In response to the single instruction, execution circuitry of an apparatus, according to the opcode of the single instruction, for each packed data element position of the packed data source/destination operand is configured to: sign extend a plurality of packed data words from a corresponding packed data element position of the first packed data source operand; sign extend a plurality of packed data words from a corresponding packed data element position of the second packed data source operand; multiply each of the plurality of sign extended packed data words from a corresponding packed data element position of the first packed data source operand with a corresponding one of the plurality of sign extended packed data words from a corresponding packed data element position of the second packed data source operand to result in a plurality of results; add the plurality of results with a packed data element of the second size of a corresponding packed data element position of the packed data source/destination operand to result in an addition result; and store the addition result in the corresponding packed data element position of the packed data source/destination operand.
-
公开(公告)号:EP3989062A1
公开(公告)日:2022-04-27
申请号:EP21207387.8
申请日:2016-10-20
申请人: INTEL Corporation
发明人: Valentine, Robert , Ryvchin, Galina , Majcher, Piotr , Charney, Mark J. , Ould-Ahmed-Vall, ElMoustapha , Corbal, Jesus , Girkar, Milind B. , Sperber, Zeev , Rubanovich, Simon , Gradstein, Amit
摘要: In some embodiments, a single instruction is provided that has an opcode, a first field to represent a packed data source/destination operand, a second field to represent a first packed data source operand, and a third field to represent a second packed data source operand. Packed data elements of the first and second packed data source operands are of a first size and packed data elements of the packed data source/destination operand are of a second size greater than the first size. In response to the single instruction, execution circuitry of an apparatus, according to the opcode of the single instruction, for each packed data element position of the packed data source/destination operand is configured to: sign extend a plurality of packed signed data words from a corresponding packed data element position of the first packed data source operand; sign extend a plurality of packed signed data words from a corresponding packed data element position of the second packed data source operand; multiply each of the plurality of sign extended packed signed data words from a corresponding packed data element position of the first packed data source operand with a corresponding one of the plurality of sign extended packed signed data words from a corresponding packed data element position of the second packed data source operand to result in a plurality of results; add the plurality of results with a packed data element of the second size of a corresponding packed data element position of the packed data source/destination operand to result in an addition result, and saturate the addition result to result in a saturated addition result if a width of the addition result exceeds a width of the second size; and store the addition result or the saturated addition result in the corresponding packed data element position of the packed data source/destination operand.
-
公开(公告)号:EP3971709A1
公开(公告)日:2022-03-23
申请号:EP21207379.5
申请日:2016-10-20
申请人: INTEL Corporation
发明人: Valentine, Robert , Ryvchin, Galina , Majcher, Piotr , Charney, Mark J. , Ould-Ahmed-Vall, ElMoustapha , Corbal, Jesus , Girkar, Milind B. , Sperber, Zeev , Rubanovich, Simon , Gradstein, Amit
摘要: In some embodiments, a single instruction is provided that has an opcode, a first field to represent a packed data source/destination operand, a second field to represent a first packed data source operand, and a third field to represent a second packed data source operand. Packed data elements of the first and second packed data source operands are of a first size and packed data elements of the packed data source/destination operand are of a second size greater than the first size. In response to the single instruction, execution circuitry of an apparatus, according to the opcode of the single instruction, for each packed data element position of the packed data source/destination operand is configured to: sign extend a plurality of packed signed data bytes from a corresponding packed data element position of the first packed data source operand; zero extend a plurality of packed unsigned data bytes from a corresponding packed data element position of the second packed data source operand; multiply each of the sign extended plurality of packed signed data bytes from the first packed data source operand with a corresponding one of the zero extended plurality of packed unsigned data bytes from the second packed data source operand to result in a plurality of results; add the plurality of results with a packed data element of the second size of a corresponding packed data element position of the packed data source/destination operand to result in an addition result, and saturate the addition result to result in a saturated addition result if a width of the addition result exceeds a width of the second size; and store the addition result or the saturated addition result in the corresponding packed data element position of the packed data source/destination operand.
-
-
-
-
-
-