-
公开(公告)号:US10514923B2
公开(公告)日:2019-12-24
申请号:US15850180
申请日:2017-12-21
Applicant: Intel Corporation
Inventor: Venkateswara Madduri , Elmoustapha Ould-Ahmed-Vall , Robert Valentine , Mark Charney , Jesus Corbal
Abstract: An apparatus and method for performing signed multiplication of packed signed doublewords and accumulation with a signed quadword. For example, one embodiment of a processor comprises: a first source register to store a first plurality of packed signed doubleword data elements; a second source register to store a second plurality of packed signed doubleword data elements; a third source register to store a plurality of packed signed quadword data elements; execution circuitry to execute the decoded instruction, the execution circuitry comprising: multiplier circuitry to multiply first and second packed signed doubleword data elements from the first source register with third and fourth packed signed doubleword data elements from the second source register, respectively, to generate first and second temporary signed quadword products, the multiplier circuitry to select the first, second, third, and fourth signed doubleword data elements based on the opcode of the instruction; accumulation circuitry to combine the first temporary signed quadword product with a first packed signed quadword value read from the third source register to generate a first accumulated signed quadword result and to combine the second temporary signed quadword product with a second packed signed quadword value read from the third source register to generate a second accumulated signed quadword result; a destination register or the third source register to store the first accumulated signed quadword result in a first signed quadword data element position and to store the second accumulated signed quadword result in a second signed quadword data element position.
-
公开(公告)号:US20190199370A1
公开(公告)日:2019-06-27
申请号:US16291231
申请日:2019-03-04
Applicant: Intel Corporation
Inventor: Venkateswara Madduri , Elmoustapha Ould-Ahmed-Vall , Robert Valentine , Jesus Corbal , Mark Charney
Abstract: Embodiments of an instruction, its operation, and executional support for the instruction are described. In some embodiments, a processor comprises decode circuitry to decode an instruction having fields for an opcode, a packed data source operand identifier, and a packed data destination operand identifier; and execution circuitry to execute the decoded instruction to convert a single precision floating point data element of a least significant packed data element position of the identified packed data source operand to a fixed-point representation, store the fixed-point representation as 32-bit integer and a 32-bit integer exponent in the two least significant packed data element positions of the identified packed data destination operand, and zero of all remaining packed data elements of the identified packed data destination operand.
-
63.
公开(公告)号:US20190196823A1
公开(公告)日:2019-06-27
申请号:US15850131
申请日:2017-12-21
Applicant: Intel Corporation
Inventor: Venkateswara Madduri , Elmoustapha Ould-Ahmed-Vall , Robert Valentine , Mark Charney
CPC classification number: G06F9/30036 , G06F7/50 , G06F9/3001 , G06F9/30098 , G06F9/30145
Abstract: An apparatus and method for performing a packed horizontal addition of words and doublewords. For example, one embodiment of a processor comprises: a decoder to decode a packed horizontal add instruction to generate a decoded packed horizontal add instruction, the packed horizontal add instruction including an opcode and operands identifying a plurality of packed words; a source register to store a first plurality of packed words; execution circuitry to execute the decoded instruction, the execution circuitry comprising: operand selection circuitry to identify first and second packed words from the source register in accordance with the operand and opcode of the packed horizontal add instruction; adder circuitry to add the first and second packed words to generate a temporary sum; a temporary storage of at least 17 bits to store the temporary sum; saturation circuitry to saturate the temporary sum if necessary to generate a final result; a destination register to store the final result as a packed result word in a designated data element position.
-
公开(公告)号:US10318298B2
公开(公告)日:2019-06-11
申请号:US15721382
申请日:2017-09-29
Applicant: Intel Corporation
Inventor: Venkateswara Madduri , Elmoustapha Ould-Ahmed-Vall , Robert Valentine , Mark Charney , Jesus Corbal
IPC: G06F9/30
Abstract: An apparatus and method for performing left-shifting operations on packed quadword data. For example, one embodiment of a processor comprises: a decoder to decode a left-shift instruction to generate a decoded left-shift instruction; a first source register to store a plurality of packed quadwords data elements; execution circuitry to execute the decoded left-shift instruction, the execution circuitry comprising shift circuitry to left-shift at least first and second packed quadword data elements from first and second packed quadword data element locations, respectively, in the first source register by an amount specified in an immediate value or in a control value in a second source register, to generate first and second left-shifted quadwords; the execution circuitry to cause selection of 16 most significant bits of the first and second left-shifted quadwords to be written to 16 least significant bit regions of first and second quadword data element locations, respectively, of a destination register; and the destination register to store the specified set of the 16 most significant bits of the first and second left-shifted quadwords.
-
公开(公告)号:US20190163472A1
公开(公告)日:2019-05-30
申请号:US15824324
申请日:2017-11-28
Applicant: Intel Corporation
Inventor: Robert Valentine , Mark Charney , Raanan Sade , Elmoustapha Ould-Ahmed-Vall , Jesus Corbal , Roman S. Dubtsov
IPC: G06F9/30
Abstract: An embodiment of the invention is a processor including execution circuitry to calculate, in response to a decoded instruction, a result of a complex multiply-accumulate of a first complex number, a second complex number, and a third complex number. The calculation includes a first operation to calculate a first term of a real component of the result and a first term of the imaginary component of the result. The calculation also includes a second operation to calculate a second term of the real component of the result and a second term of the imaginary component of the result. The processor also includes a decoder to decode an instruction to generate the decoded instruction and a first source register, a second source register, and a source and destination register to provide the first complex number, the second complex number, and the third complex number, respectively.
-
公开(公告)号:US10224954B1
公开(公告)日:2019-03-05
申请号:US15721573
申请日:2017-09-29
Applicant: INTEL CORPORATION
Inventor: Venkateswara Madduri , Elmoustapha Ould-Ahmed-Vall , Robert Valentine , Jesus Corbal , Mark Charney
Abstract: Embodiments of an instruction, its operation, and executional support for the instruction are described. In some embodiments, a processor comprises decode circuitry to decode an instruction having fields for an opcode, a packed data source operand identifier, and a packed data destination operand identifier; and execution circuitry to execute the decoded instruction to convert a single precision floating point data element of a least significant packed data element position of the identified packed data source operand to a fixed-point representation, store the fixed-point representation as 32-bit integer and a 32-bit integer exponent in the two least significant packed data element positions of the identified packed data destination operand, and zero of all remaining packed data elements of the identified packed data destination operand.
-
67.
公开(公告)号:US09513917B2
公开(公告)日:2016-12-06
申请号:US14170397
申请日:2014-01-31
Applicant: Intel Corporation
Inventor: Robert C. Valentine , Jesus Corbal San Adrian , Roger Espasa Sans , Robert D. Cavin , Bret L. Toll , Santiago Galan Duran , Jeffrey G. Wiedemeier , Sridhar Samudrala , Milind Baburao Girkar , Edward Thomas Grochowski , Jonathan Cannon Hall , Dennis R. Bradford , Elmoustapha Ould-Ahmed-Vall , James C. Abel , Mark Charney , Seth Abraham , Suleyman Sair , Andrew Thomas Forsyth , Lisa Wu , Charles Yount
CPC classification number: G06F9/30181 , G06F9/3001 , G06F9/30014 , G06F9/30018 , G06F9/30032 , G06F9/30036 , G06F9/30047 , G06F9/30145 , G06F9/30149 , G06F9/30185 , G06F9/30192 , G06F9/34
Abstract: A vector friendly instruction format and execution thereof. According to one embodiment of the invention, a processor is configured to execute an instruction set. The instruction set includes a vector friendly instruction format. The vector friendly instruction format has a plurality of fields including a base operation field, a modifier field, an augmentation operation field, and a data element width field, wherein the first instruction format supports different versions of base operations and different augmentation operations through placement of different values in the base operation field, the modifier field, the alpha field, the beta field, and the data element width field, and wherein only one of the different values may be placed in each of the base operation field, the modifier field, the alpha field, the beta field, and the data element width field on each occurrence of an instruction in the first instruction format in instruction streams.
Abstract translation: 一种向量友好的指令格式及其执行。 根据本发明的一个实施例,处理器被配置为执行指令集。 指令集包括向量友好指令格式。 向量友好指令格式具有多个字段,包括基本操作字段,修改字段,增加操作字段和数据元素宽度字段,其中第一指令格式支持不同版本的基本操作和不同的扩充操作, 基本操作字段,修饰符字段,α字段,β字段和数据元素宽度字段中的不同值,并且其中只有一个不同的值可以被放置在基本操作字段,修饰符字段, 在指令流中的第一指令格式的指令的每次出现时的alpha字段,β字段和数据元素宽度字段。
-
-
-
-
-
-