-
91.
公开(公告)号:US20190146791A1
公开(公告)日:2019-05-16
申请号:US16246438
申请日:2019-01-11
Applicant: INTEL CORPORATION
Inventor: Terence SYCH , Elmoustapha OULD-AHMED-VALL
IPC: G06F9/30 , G06F12/0875
Abstract: Instructions and logic provide SIMD vector population count functionality. Some embodiments store in each data field of a portion of n data fields of a vector register or memory vector, at least two bits of data. In a processor, a SIMD instruction for a vector population count is executed, such that for that portion of the n data fields in the vector register or memory vector, the occurrences of binary values equal to each of a first one or more predetermined binary values, are counted and the counted occurrences are stored, in a portion of a destination register corresponding to the portion of the n data fields in the vector register or memory vector, as a first one or more counts corresponding to the first one or more predetermined binary values.
-
92.
公开(公告)号:US20190108030A1
公开(公告)日:2019-04-11
申请号:US16145160
申请日:2018-09-27
Applicant: Intel Corporation
Inventor: Jesus CORBAL SAN ADRIAN , Bret L. TOLL , Robert C. VALENTINE , Jeffrey G. WIEDEMEIER , Sridhar SAMUDRALA , Milind Baburao GIRKAR , Andrew Thomas FORSYTH , Elmoustapha OULD-AHMED-VALL , Dennis R. BRADFORD , Lisa K. WU
IPC: G06F9/30
Abstract: Embodiments of systems, apparatuses, and methods for performing a blend instruction in a computer processor are described. In some embodiments, the execution of a blend instruction causes a data element-by-element selection of data elements of first and second source operands using the corresponding bit positions of a writemask as a selector between the first and second operands and storage of the selected data elements into the destination at the corresponding position in the destination.
-
93.
公开(公告)号:US20190102191A1
公开(公告)日:2019-04-04
申请号:US15721313
申请日:2017-09-29
Applicant: Intel Corporation
Inventor: Venkateswara MADDURI , Elmoustapha OULD-AHMED-VALL , Robert VALENTINE , Jesus CORBAL , Mark CHARNEY
IPC: G06F9/30
Abstract: Embodiments of systems, apparatuses, and methods for dual complex number by complex conjugate multiplication in a processor are described. For example, execution circuitry executes a decoded instruction to multiplex data values from a plurality of packed data element positions in the first and second packed data source operands to at least one multiplier circuit, the first and second packed data source operands including a plurality of pairs complex numbers, each pair of complex numbers including data values at shared packed data element positions in the first and second packed data source operands; calculate a real part and an imaginary part of a product of a first complex number and a complex conjugate of a second complex number; and store the real result to a first packed data element position in the destination operand and store the imaginary result to a second packed data element position in the destination operand.
-
公开(公告)号:US20190095202A1
公开(公告)日:2019-03-28
申请号:US16139393
申请日:2018-09-24
Applicant: Intel Corporation
Inventor: Rama Kishan V. MALLADI , Elmoustapha OULD-AHMED-VALL
IPC: G06F9/30
Abstract: Embodiments of systems, apparatuses, and methods for broadcast arithmetic in a processor are described. For example, execution circuitry executes a decoded instruction to broadcast a data value from a least significant packed data element position of a first packed data source operand to a plurality of arithmetic circuits and for each packed data element position of a second packed data source operand, other than a least significant packed data element position, perform the arithmetic operation defined by the instruction on a data value from that packed data element position of the second packed data source operand and all data values from packed data element positions of the second packed data source operand that are of lesser position significance to the broadcast data value from the least significant packed data element position of the first packed data source operand, and stores a result of each arithmetic operation into a packed data element position of the packed data destination operand that corresponds to a most significant packed data element position of the second packed data source operand.
-
公开(公告)号:US20190042257A1
公开(公告)日:2019-02-07
申请号:US16144902
申请日:2018-09-27
Applicant: Intel Corporation
Inventor: Dan BAUM , Michael ESPIG , James GUILFORD , Wajdi K. FEGHALI , Raanan SADE , Christopher J. HUGHES , Robert VALENTINE , Bret TOLL , Elmoustapha OULD-AHMED-VALL , Mark J. CHARNEY , Vinodh GOPAL , Ronen ZOHAR , Alexander F. HEINECKE
Abstract: Disclosed embodiments relate to matrix compress/decompress instructions. In one example, a processor includes fetch circuitry to fetch a compress instruction having a format with fields to specify an opcode and locations of decompressed source and compressed destination matrices, decode circuitry to decode the fetched compress instructions, and execution circuitry, responsive to the decoded compress instruction, to: generate a compressed result according to a compress algorithm by compressing the specified decompressed source matrix by either packing non-zero-valued elements together and storing the matrix position of each non-zero-valued element in a header, or using fewer bits to represent one or more elements and using the header to identify matrix elements being represented by fewer bits; and store the compressed result to the specified compressed destination matrix.
-
公开(公告)号:US20190042202A1
公开(公告)日:2019-02-07
申请号:US16144889
申请日:2018-09-27
Applicant: Intel Corporation
Inventor: Raanan SADE , Robert VALENTINE , Mark J. CHARNEY , Simon RUBANOVICH , Amit GRADSTEIN , Zeev SPERBER , Bret TOLL , Jesus CORBAL , Christopher J. HUGHES , Alexander F. HEINECKE , Elmoustapha OULD-AHMED-VALL
IPC: G06F7/78 , G06F9/30 , G06F9/38 , G06F15/173
Abstract: Disclosed embodiments relate to systems and methods for performing instructions to transpose rectangular tiles. In one example, a processor includes fetch circuitry to fetch an instruction having fields to specify an opcode and locations of first destination, second destination, first source, and second source matrices, the specified opcode to cause the processor to process each of the specified source and destination matrices as a rectangular matrix, decode circuitry to decode the fetched rectangular matrix transpose instruction, and execution circuitry to respond to the decoded rectangular matrix transpose instruction by transposing each row of elements of the specified first source matrix into a corresponding column of the specified first destination matrix and transposing each row of elements of the specified second source matrix into a corresponding column of the specified second destination matrix.
-
公开(公告)号:US20180052686A1
公开(公告)日:2018-02-22
申请号:US15785030
申请日:2017-10-16
Applicant: Intel Corporation
Inventor: Elmoustapha OULD-AHMED-VALL , Robert VALENTINE
IPC: G06F9/30
CPC classification number: G06F9/30018 , G06F9/30032 , G06F9/30036 , G06F9/30098
Abstract: An apparatus and method are described for performing a bit reversal and permutation on mask values. For example, a processor is described to execute an instruction to perform the operations of: reading a plurality of mask bits stored in a source mask register, the mask bits associated with vector data elements of a vector register; and performing a bit reversal operation to copy each mask bit from a source mask register to a destination mask register, wherein the bit reversal operation causes bits from the source mask register to be reversed within the destination mask register resulting in a symmetric, mirror image of the original bit arrangement.
-
公开(公告)号:US20170220350A1
公开(公告)日:2017-08-03
申请号:US15487080
申请日:2017-04-13
Applicant: Intel Corporation
Inventor: Elmoustapha OULD-AHMED-VALL , Robert VALENTINE
IPC: G06F9/30
CPC classification number: G06F9/30018 , G06F9/30032 , G06F9/30036 , G06F9/30098
Abstract: An apparatus and method are described for performing a bit reversal and permutation on mask values. For example, a processor is described to execute an instruction to perform the operations of: reading a plurality of mask bits stored in a source mask register, the mask bits associated with vector data elements of a vector register; and performing a bit reversal operation to copy each mask bit from a source mask register to a destination mask register, wherein the bit reversal operation causes bits from the source mask register to be reversed within the destination mask register resulting in a symmetric, mirror image of the original bit arrangement.
-
-
-
-
-
-
-