-
21.
公开(公告)号:US20220414182A1
公开(公告)日:2022-12-29
申请号:US17359519
申请日:2021-06-26
Applicant: Intel Corporation
Inventor: Menachem ADELMAN , Robert VALENTINE , Zeev SPERBER , Amit GRADSTEIN , Simon RUBANOVICH , Sagi MELLER , Christopher HUGHES , Evangelos GEORGANAS , Alexander HEINECKE , Mark CHARNEY
Abstract: Techniques for matrix multiplication are described. In some examples, decode circuitry is to decode a single instruction having fields for an opcode, an indication of a location of a first source operand, an indication of a location of a second source operand, and an indication of a location of a destination operand, wherein the opcode is to indicate that execution circuitry is to at least convert data elements of the first and second source operands from a first floating point representation to a second floating point representation, perform matrix multiplication with the converted data elements, and accumulate results of the matrix multiplication in the destination operand in the first floating point representation; and the execution circuitry is to execute to the decoded instruction as specified by the opcode.
-
公开(公告)号:US20220129273A1
公开(公告)日:2022-04-28
申请号:US17518235
申请日:2021-11-03
Applicant: INTEL CORPORATION
Inventor: ElMoustapha OULD-AHMED-VALL , Robert VALENTINE , Mark CHARNEY , Jesus CORBAL , Venkateswara MADDURI
Abstract: An apparatus and method for performing signed multiplication of packed signed doublewords and accumulation with a signed quadword. For example, one exemplary processor comprises three registers and execution circuitry. The execution circuitry is to multiply first and second packed signed doubleword data elements from the first register with third and fourth packed signed doubleword data elements from the second register, respectively, to generate first and second temporary products. It is also to select first, second, third, and fourth signed doubleword data elements. It is also to combine the first temporary products with a first packed signed quadword value read from the third register to generate a first accumulated result and to combine the second temporary product with a second packed signed quadword value read from the third source register to generate a second accumulated result. The third register is to store the results.
-
公开(公告)号:US20230098724A1
公开(公告)日:2023-03-30
申请号:US17485374
申请日:2021-09-25
Applicant: Intel Corporation
Inventor: Vedvyas SHANBHOGUE , Robert VALENTINE , Mark CHARNEY , Venkateswara MADDURI
IPC: G06F9/30
Abstract: Techniques for copying a subset of status flags from a control and status register to a flags register in response to an instruction are described. An exemplary instruction includes a field for an opcode, the opcode to indicate execution circuitry is to copy from a first register a saturation flag value, an overflow value, and a carry value to a second register into one or more instructions of a different instruction set.
-
公开(公告)号:US20230072105A1
公开(公告)日:2023-03-09
申请号:US17463410
申请日:2021-08-31
Applicant: Intel Corporation
Inventor: Alexander HEINECKE , Menachem ADELMAN , Robert VALENTINE , Zeev SPERBER , Amit GRADSTEIN , Mark CHARNEY , Evangelos GEORGANAS , Dhiraj KALAMKAR , Christopher HUGHES , Cristina ANDERSON
IPC: G06F9/30
Abstract: Techniques for comparing BF16 data elements are described. An exemplary BF16 comparison instruction includes fields for an opcode, an identification of a location of a first packed data source operand, and an identification of a location of a second packed data source operand, wherein the opcode is to indicate that execution circuitry is to perform, for a particular data element position of the packed data source operands, a comparison of a data element at that position, and update a flags register based on the comparison.
-
公开(公告)号:US20230068781A1
公开(公告)日:2023-03-02
申请号:US17463382
申请日:2021-08-31
Applicant: Intel Corporation
Inventor: Menachem ADELMAN , Alexander HEINECKE , Robert VALENTINE , Zeev SPERBER , Amit GRADSTEIN , Mark CHARNEY , Evangelos GEORGANAS , Dhiraj KALAMKAR , Christopher HUGHES , Cristina ANDERSON
IPC: G06F9/30
Abstract: Techniques for scale and reduction of BF16 data elements are described. An exemplary instruction includes fields for an having fields for an opcode, an identification of a location of a first packed data source operand, an identification of a location of a second packed data source operand, and an identification of a packed data destination operand, wherein the opcode is to indicate that execution circuitry is to perform, for each data element position of the packed data source operands, a floating point scale operation of a BF16 data element of the first packed data source by multiplying the data element by a power of 2 value, wherein a value of the exponent of the power of 2 value is a floor value of a BF16 data element of the second packed data source, and store a result of the floating point scale operation into a corresponding data element position of the packed data destination operand.
-
公开(公告)号:US20220413861A1
公开(公告)日:2022-12-29
申请号:US17359522
申请日:2021-06-26
Applicant: Intel Corporation
Inventor: Venkateswara MADDURI , Cristina ANDERSON , Robert VALENTINE , Mark CHARNEY , Vedvyas SHANBHOGUE
IPC: G06F9/30
Abstract: Techniques for matrix multiplication are described. In some examples, a single instruction having a format of fields for an opcode, one or more fields to indicate a location of a source/destination operand, one or more fields to indicate a location of a first source operand, and one or more fields to indicate a location of a second source operand is used. Wherein the opcode is to indicate that execution circuitry is to: multiply values from corresponding data elements of the first and second sources, add a first subset of the multiplied values to a first value from the source/destination operand and store in a first data element position of the source/destination operand, and add a second subset of the multiplied values to a second value from the source/destination operand and store in a second data element position of the source/destination operand.
-
公开(公告)号:US20220129274A1
公开(公告)日:2022-04-28
申请号:US17524624
申请日:2021-11-11
Applicant: Intel Corporation
Inventor: Robert C. VALENTINE , Jesus Corbal SAN ADRIAN , Roger Espasa SANS , Robert D. CAVIN , Bret L. TOLL , Santiago Galan DURAN , Jeffrey G. WIEDEMEIER , Sridhar SAMUDRALA , Milind Baburao GIRKAR , Edward Thomas GROCHOWSKI , Jonathan Cannon HALL , Dennis R. BRADFORD , Elmoustapha OULD-AHMED-VALL , James C ABEL , Mark CHARNEY , Seth ABRAHAM , Suleyman SAIR , Andrew Thomas FORSYTH , Lisa WU , Charles YOUNT
Abstract: A vector friendly instruction format and execution thereof. According to one embodiment of the invention, a processor is configured to execute an instruction set. The instruction set includes a vector friendly instruction format. The vector friendly instruction format has a plurality of fields including a base operation field, a modifier field, an augmentation operation field, and a data element width field, wherein the first instruction format supports different versions of base operations and different augmentation operations through placement of different values in the base operation field, the modifier field, the alpha field, the beta field, and the data element width field, and wherein only one of the different values may be placed in each of the base operation field, the modifier field, the alpha field, the beta field, and the data element width field on each occurrence of an instruction in the first instruction format in instruction streams.
-
公开(公告)号:US20220129268A1
公开(公告)日:2022-04-28
申请号:US17518336
申请日:2021-11-03
Applicant: INTEL CORPORATION
Inventor: Venkateswara MADDURI , ElMoustapha OULD-AHMED-VALL , Robert VALENTINE , Mark CHARNEY
IPC: G06F9/30
Abstract: An apparatus and method for performing right-shifting operations on packed quadword data. For example, one embodiment of a processor comprises a decoder to decode a right-shift instruction, a first source register to store a plurality of packed quadword data elements, and execution circuitry to execute the decoded right-shift instruction. The execution circuitry comprises shift circuitry with sign preservation logic to right-shift first and second packed quadword data elements in the first source register by an amount specified in an immediate value or in a control value in a second source register, the right-shifting to generate first and second right-shifted quadwords, the sign preservation logic to shift in the sign bit. The execution circuitry is to cause selection of 16 most significant bits of the first and second right-shifted quadwords to be written to 16 least significant bit regions of first and second quadword data element locations of a destination register.
-
29.
公开(公告)号:US20220129267A1
公开(公告)日:2022-04-28
申请号:US17518291
申请日:2021-11-03
Applicant: INTEL CORPORATION
Inventor: Venkateswara MADDURI , ElMoustapha OULD-AHMED-VALL , Robert VALENTINE , Mark CHARNEY
IPC: G06F9/30
Abstract: An apparatus and method for performing right-shifting operations on packed quadword data. For example, one processor embodiment comprises a decoder to decode a right-shift instruction, a first source register to store a plurality of packed quadword data elements, and execution circuitry to execute the decoded right-shift instruction. The execution circuitry comprises shift circuitry with sign preservation logic to right-shift first and second packed quadword data elements in the first source register by an amount specified in an immediate value or in a control value in a second source register, the right-shifting to generate first and second right-shifted quadwords, the sign preservation logic to shift in the sign bit. The execution circuitry is to cause selection of 32 most significant bits of the first and second right-shifted quadwords to be written to 32 least significant bit positions of first and second quadword data element locations of a destination register.
-
公开(公告)号:US20190227800A1
公开(公告)日:2019-07-25
申请号:US16289506
申请日:2019-02-28
Applicant: Intel Corporation
Inventor: Robert C. VALENTINE , Jesus Corbal SAN ADRIAN , Roger Espasa SANS , Robert D. CAVIN , Bret L. TOLL , Santiago Galan DURAN , Jeffrey G. WIEDEMEIER , Sridhar SAMUDRALA , Milind Baburao GIRKAR , Edward Thomas GROCHOWSKI , Jonathan Cannon HALL , Dennis R. BRADFORD , Elmoustapha OULD-AHMED-VALL , James C. ABEL , Mark CHARNEY , Seth ABRAHAM , Suleyman SAIR , Andrew Thomas FORSYTH , Lisa WU , Charles YOUNT
IPC: G06F9/30
CPC classification number: G06F9/30145 , G06F9/3001 , G06F9/30014 , G06F9/30018 , G06F9/30025 , G06F9/30032 , G06F9/30036 , G06F9/30047 , G06F9/30149 , G06F9/30181 , G06F9/30185 , G06F9/30192 , G06F9/34
Abstract: A vector friendly instruction format and execution thereof. According to one embodiment of the invention, a processor is configured to execute an instruction set. The instruction set includes a vector friendly instruction format. The vector friendly instruction format has a plurality of fields including a base operation field, a modifier field, an augmentation operation field, and a data element width field, wherein the first instruction format supports different versions of base operations and different augmentation operations through placement of different values in the base operation field, the modifier field, the alpha field, the beta field, and the data element width field, and wherein only one of the different values may be placed in each of the base operation field, the modifier field, the alpha field, the beta field, and the data element width field on each occurrence of an instruction in the first instruction format in instruction streams.
-
-
-
-
-
-
-
-
-