-
11.
公开(公告)号:US20230102279A1
公开(公告)日:2023-03-30
申请号:US17485363
申请日:2021-09-25
Applicant: Intel Corporation
Inventor: Menachem ADELMAN , Robert VALENTINE , Dan BAUM , Amit GRADSTEIN , Simon RUBANOVICH , Regev SHEMY , Zeev SPERBER , Alexander HEINECKE , Christopher HUGHES , Evangelos GEORGANAS , Mark CHARNEY , Arik NARKIS , Rinat RAPPOPORT , Barukh ZIV , Yaroslav POLLAK , Nilesh JAIN , Yash AKHAURI , Brinda GANESH , Rajesh POORNACHANDRAN , Guy BOUDOUKH
Abstract: Systems, methods, and apparatuses relating sparsity based FMA. In some examples, an instance of a single FMA instruction has one or more fields for an opcode, one or more fields to identify a source/destination matrix operand, one or more fields to identify a first plurality of source matrix operands, one or more fields to identify a second plurality of matrix operands, wherein the opcode is to indicate that execution circuitry is to select a proper subset of data elements from the first plurality of source matrix operands based on sparsity controls from a first matrix operand of the second plurality of matrix operands and perform a FMA.
-
公开(公告)号:US20230067810A1
公开(公告)日:2023-03-02
申请号:US17463405
申请日:2021-08-31
Applicant: Intel Corporation
Inventor: Alexander HEINECKE , Menachem ADELMAN , Robert VALENTINE , Zeev SPERBER , Amit GRADSTEIN , Mark CHARNEY , Evangelos GEORGANAS , Dhiraj KALAMKAR , Christopher HUGHES , Cristina ANDERSON
Abstract: Techniques for performing BF16 FMA in response to an instruction are described. In some examples, an instruction has fields for an opcode, an identification of location of a packed data source/destination operand (a first source), an identification of a location of a second packed data source operand, an identification of a location of a third packed data source operand, and an identification of location of a packed data source/destination operand, wherein the opcode is to indicate operand ordering and that execution circuitry is to, per data element position, perform a BF16 value fused multiply-accumulate operation using the first, second, and third source operands and store a result in a corresponding data element position of the source/destination operand
-
公开(公告)号:US20230061618A1
公开(公告)日:2023-03-02
申请号:US17463374
申请日:2021-08-31
Applicant: Intel Corporation
Inventor: Menachem ADELMAN , Alexander HEINECKE , Robert VALENTINE , Zeev SPERBER , Amit GRADSTEIN , Mark CHARNEY , Evangelos GEORGANAS , Dhiraj KALAMKAR , Christopher HUGHES , Cristina ANDERSON
Abstract: Techniques for performing square root or reciprocal square root calculations on BF16 data elements in response to an instruction are described. An example of an instruction is one that includes fields for an opcode, an identification of a location of a packed data source operand, and an identification of a packed data destination operand, wherein the opcode is to indicate that execution circuitry is to perform, for each data element position of the packed data source operand, a calculation of a square root value of a BF16 data element in that position and store a result of each square root into a corresponding data element position of the packed data destination operand.
-
公开(公告)号:US20220308873A1
公开(公告)日:2022-09-29
申请号:US17214853
申请日:2021-03-27
Applicant: Intel Corporation
Inventor: Menachem ADELMAN , Robert VALENTINE , Amit GRADSTEIN , Daniel TOWNER , Mark CHARNEY
IPC: G06F9/30
Abstract: Systems, methods, and apparatuses relating to interleaving data values. An embodiment includes decoding circuitry to decode a single instruction, the instruction having one or more fields to specify an opcode, one or more fields to specify a location of a first source operand, one or more fields to specify a location of a second source operand, one or more fields to specify a location of a destination operand, and one or more fields to specify an index value to be used to index a row in the first source operand, wherein the opcode is to indicate execution circuitry is to downconvert data elements of the indexed row of the first source operand, interleave the downconverted elements with data elements of the second source operand, and store the interleaved elements in the destination operand; and execution circuitry to execute the decoded instruction according to the opcode.
-
15.
公开(公告)号:US20190102191A1
公开(公告)日:2019-04-04
申请号:US15721313
申请日:2017-09-29
Applicant: Intel Corporation
Inventor: Venkateswara MADDURI , Elmoustapha OULD-AHMED-VALL , Robert VALENTINE , Jesus CORBAL , Mark CHARNEY
IPC: G06F9/30
Abstract: Embodiments of systems, apparatuses, and methods for dual complex number by complex conjugate multiplication in a processor are described. For example, execution circuitry executes a decoded instruction to multiplex data values from a plurality of packed data element positions in the first and second packed data source operands to at least one multiplier circuit, the first and second packed data source operands including a plurality of pairs complex numbers, each pair of complex numbers including data values at shared packed data element positions in the first and second packed data source operands; calculate a real part and an imaginary part of a product of a first complex number and a complex conjugate of a second complex number; and store the real result to a first packed data element position in the destination operand and store the imaginary result to a second packed data element position in the destination operand.
-
公开(公告)号:US20240427600A1
公开(公告)日:2024-12-26
申请号:US18827453
申请日:2024-09-06
Applicant: Intel Corporation
Inventor: Robert C. VALENTINE , Jesus Corbal SAN ADRIAN , Roger Espasa SANS , Robert D. CAVIN , Bret L. TOLL , Santiago Galan DURAN , Jeffrey G. WIEDEMEIER , Sridhar SAMUDRALA , Milind Baburao GIRKAR , Edward Thomas GROCHOWSKI , Jonathan Cannon HALL , Dennis R. BRADFORD , Elmoustapha OULD-AHMED-VALL , James C ABEL , Mark CHARNEY , Seth ABRAHAM , Suleyman SAIR , Andrew Thomas FORSYTH , Lisa WU , Charles YOUNT
IPC: G06F9/30 , G06F9/34 , H01L29/66 , H01L29/775 , H01L29/78 , H01L29/786
Abstract: A vector friendly instruction format and execution thereof. According to one embodiment of the invention, a processor is configured to execute an instruction set. The instruction set includes a vector friendly instruction format. The vector friendly instruction format has a plurality of fields including a base operation field, a modifier field, an augmentation operation field, and a data element width field, wherein the first instruction format supports different versions of base operations and different augmentation operations through placement of different values in the base operation field, the modifier field, the alpha field, the beta field, and the data element width field, and wherein only one of the different values may be placed in each of the base operation field, the modifier field, the alpha field, the beta field, and the data element width field on each occurrence of an instruction in the first instruction format in instruction streams.
-
17.
公开(公告)号:US20240248722A1
公开(公告)日:2024-07-25
申请号:US18626629
申请日:2024-04-04
Applicant: Intel Corporation
Inventor: Eliezer WEISSMANN , Mark CHARNEY , Michael MISHAELI , Robert VALENTINE , Itai RAVID , Jason W. BRANDT , Gilbert NEIGER , Baruch CHAIKIN , Efraim ROTEM
CPC classification number: G06F9/3851 , G06F9/30043 , G06F9/30076 , G06F9/30101 , G06F9/3836 , G06F9/3842
Abstract: Systems, methods, and apparatuses relating to instructions to reset software thread runtime property histories in a hardware processor are described. In one embodiment, a hardware processor includes a hardware guide scheduler comprising a plurality of software thread runtime property histories; a decoder to decode a single instruction into a decoded single instruction, the single instruction having a field that identifies a model-specific register; and an execution circuit to execute the decoded single instruction to check that an enable bit of the model-specific register is set, and when the enable bit is set, to reset the plurality of software thread runtime property histories of the hardware guide scheduler.
-
公开(公告)号:US20240184585A1
公开(公告)日:2024-06-06
申请号:US18436982
申请日:2024-02-08
Applicant: Intel Corporation
Inventor: Alexander HEINECKE , Menachem ADELMAN , Robert VALENTINE , Zeev SPERBER , Amit GRADSTEIN , Mark CHARNEY , Evangelos GEORGANAS , Dhiraj KALAMKAR , Christopher HUGHES , Cristina ANDERSON
IPC: G06F9/30
CPC classification number: G06F9/3016 , G06F9/30014 , G06F9/30036 , G06F9/30101
Abstract: Techniques for comparing BF16 data elements are described. An exemplary BF16 comparison instruction includes fields for an opcode, an identification of a location of a first packed data source operand, and an identification of a location of a second packed data source operand, wherein the opcode is to indicate that execution circuitry is to perform, for a particular data element position of the packed data source operands, a comparison of a data element at that position, and update a flags register based on the comparison.
-
公开(公告)号:US20230205527A1
公开(公告)日:2023-06-29
申请号:US17560547
申请日:2021-12-23
Applicant: Intel Corporation
Inventor: Robert VALENTINE , Wing Shek WONG , Jonathan COMBS , Mark CHARNEY
IPC: G06F9/30
CPC classification number: G06F9/30145 , G06F9/30098 , G06F9/30025
Abstract: Techniques for data type conversion using an instruction are described. An exemplary instruction includes fields for an opcode, an identification of source operands, and an identification of destination operand, wherein the opcode is to indicate execution circuitry and/or memory access circuitry is to convert 32-bit floating point values from the identified source operands into 16-bit floating point values and store 16-bit floating point values in data element positions of the identified destination operand.
-
公开(公告)号:US20230004393A1
公开(公告)日:2023-01-05
申请号:US17359552
申请日:2021-06-26
Applicant: Intel Corporation
Inventor: Venkateswara Rao MADDURI , Robert VALENTINE , Mark CHARNEY , Cristina ANDERSON
Abstract: Apparatus and method for signed and unsigned shift, round and saturate using different data element values. For example, one embodiment of an apparatus comprises a decoder to decode an instruction having fields for a first packed data source operand to provide a first source data element and a second source data element, a second packed data source operand or immediate to provide a first shift value and a second shift value corresponding to the first source data element and second source data element, respectively, and a packed data destination operand to indicate a first result value and a second result value corresponding to the first source data element and second source data element, and execution circuitry to execute the decoded instruction to: shift the first source data element by an amount based on the first shift value to generate a first shifted data element; shift the second source data element by an amount based on the second shift value to generate a second shifted data element; update a saturation indicator responsive to detecting a saturation condition resulting from the shift of the first and/or second source data elements; round and/or saturate the first and second shifted data elements in accordance with a specified rounding mode and the saturation indicator, respectively, to generate the first and second result data elements; and store the first result value and the second result value in a first data element location and a second data element location in a destination register.
-
-
-
-
-
-
-
-
-