-
1.
公开(公告)号:US20240045689A1
公开(公告)日:2024-02-08
申请号:US17958377
申请日:2022-10-01
申请人: Intel Corporation
发明人: Alexander Heinecke , Menachem Adelman , Evangelos Georganas , Amit Gradstein , Christopher Hughes , Naveen Mellempudi , Simon Rubanovich , Uri Sherman , Zeev Sperber
CPC分类号: G06F9/3016 , G06F7/4876 , G06F17/16 , G06F9/3802 , G06F9/3013 , G06F9/3001
摘要: Disclosed embodiments relate to systems and methods for performing 8-bit floating-point vector dot product instructions. In one example, a processor includes fetch circuitry to fetch an instruction having fields to specify an opcode and locations of first source, second source, and destination vectors, the opcode to indicate execution circuitry is to multiply pairs of 8-bit floating-point formatted elements of the specified first and second sources, and accumulate the resulting products with previous contents of a corresponding single-precision element of the specified destination, decode circuitry to decode the fetched instruction, and execution circuitry to respond to the decoded instruction as specified by the opcode.
-
公开(公告)号:US20240045684A1
公开(公告)日:2024-02-08
申请号:US17958380
申请日:2022-10-01
申请人: Intel Corporation
发明人: Alexander Heinecke , Menachem Adelman , Mark Charney , Evangelos Georganas , Amit Gradstein , Christopher Hughes , Naveen Mellempudi , Simon Rubanovich , Uri Sherman , Zeev Sperber , Robert Valentine
IPC分类号: G06F9/30
CPC分类号: G06F9/30145 , G06F9/30036 , G06F9/30018
摘要: Techniques for converting FP16 to BF8 using bias are described. An example embodiment utilizes decoder circuitry to decode a single instruction, the single instruction to include one or more fields to identify a first source operand, one or more fields to identify a second source operand, one or more fields to identify a source/destination operand, and one or more fields for an opcode, wherein the opcode is to indicate that execution circuitry is to convert packed half-precision data from the identified first and second sources to packed FP8 data using bias terms from the identified source/destination operand and store the packed FP8 data into corresponding data element positions of the identified source/destination operand; and execution circuitry to execute the decoded instruction according to the opcode to convert packed half-precision data from the identified first and second sources to packed FP8 data using bias terms from the identified source/destination operand and store the packed FP8 data into corresponding data element positions of the identified source/destination operand.
-
公开(公告)号:US11768681B2
公开(公告)日:2023-09-26
申请号:US15879419
申请日:2018-01-24
申请人: Intel Corporation
CPC分类号: G06F9/3001 , G06F9/3013 , G06F9/30014 , G06F9/3016 , G06F9/30018 , G06F9/30036 , G06F9/3893
摘要: An apparatus and method for performing multiply-accumulate operations. For example, one embodiment of a processor comprises: a decoder to decode instructions; a first source register to store a first plurality of packed bytes; a second source register to store a second plurality of packed bytes; a third source register to store a plurality of packed doublewords; execution circuitry to execute a first instruction, the execution circuitry comprising: extension circuitry to sign-extend or zero-extend the first and second plurality of packed bytes to generate a first and second plurality of words corresponding to the first and second plurality of packed bytes; multiplier circuitry to multiply each of the first plurality of words with a corresponding one of the second plurality of words to generate a plurality of temporary products; adder circuitry to add at least a first set of the temporary products to generate a first temporary sum; accumulation circuitry to combine the first temporary sum with a first packed doubleword value from a first doubleword location in the third source register to generate a first accumulated doubleword result; a destination register to store the first accumulated doubleword result in the first doubleword location.
-
公开(公告)号:US20230128680A1
公开(公告)日:2023-04-27
申请号:US18069178
申请日:2022-12-20
申请人: Intel Corporation
发明人: Marcos Emanuel Carranza , Cesar Martinez-Spessot , Mats Agerstam , Maria Ramirez Loaiza , Alexander Heinecke , Justin Gottschlich
IPC分类号: G06N20/10 , G06F18/213 , G06F18/232 , G06F8/41 , G06N20/20
摘要: Methods, apparatus, systems and articles of manufacture to provide machine assisted programming are disclosed. An example apparatus includes processor circuitry to execute computer readable instructions to: execute a machine learning model to generate a first code recommendation for programming code, the first code recommendation being associated with security of the programming code; cause output of the first code recommendation via a user interface; update the machine learning model based on feedback obtained via the user interface; determine a performance of the programming code; generate a second code recommendation, the second code recommendation being associated with the performance of the programming code; and cause output of the second code recommendation via the user interface.
-
公开(公告)号:US11544057B2
公开(公告)日:2023-01-03
申请号:US17069230
申请日:2020-10-13
申请人: INTEL CORPORATION
发明人: Gregory Henry , Alexander Heinecke
摘要: Embodiments detailed herein relate to arithmetic operations of float-point values. An exemplary processor includes decoding circuitry to decode an instruction, where the instruction specifies locations of a plurality of operands, values of which being in a floating-point format. The exemplary processor further includes execution circuitry to execute the decoded instruction, where the execution includes to: convert the values for each operand, each value being converted into a plurality of lower precision values, where an exponent is to be stored for each operand; perform arithmetic operations among lower precision values converted from values for the plurality of the operands; and generate a floating-point value by converting a resulting value from the arithmetic operations into the floating-point format and store the floating-point value.
-
公开(公告)号:US20210406018A1
公开(公告)日:2021-12-30
申请号:US16914347
申请日:2020-06-27
申请人: INTEL CORPORATION
发明人: Menachem Adelman , Robert Valentine , Barukh Ziv , Yaroslav Pollak , Gideon Stupp , Amit Gradstein , Simon Rubanovich , Zeev Sperber , Mark Charney , Christopher Hughes , Alexander Heinecke
摘要: Systems, methods, and apparatuses relating to one or more instructions that utilize direct paths for loading data into a tile from a vector register and/or storing data from a tile into a vector register are described. In one embodiment, a system includes a matrix operations accelerator circuit comprising a two-dimensional grid of processing elements, a plurality of registers that represents a two-dimensional matrix coupled to the two-dimensional grid of processing elements, and a coupling to a cache; and a hardware processor core comprising: a vector register, a decoder to decode a single instruction into a decoded single instruction, the single instruction including a first field that identifies the two-dimensional matrix, a second field that identifies a set of elements of the two-dimensional matrix, and a third field that identifies the vector register, and an execution circuit to execute the decoded single instruction to cause a store of the set of elements from the plurality of registers that represents the two-dimensional matrix into the vector register by a coupling of the hardware processor core to the matrix operations accelerator circuit that is separate from the coupling to the cache.
-
公开(公告)号:US11093247B2
公开(公告)日:2021-08-17
申请号:US15858932
申请日:2017-12-29
申请人: Intel Corporation
发明人: Raanan Sade , Simon Rubanovich , Amit Gradstein , Zeev Sperber , Alexander Heinecke , Robert Valentine , Mark J. Charney , Bret Toll , Jesus Corbal , Elmoustapha Ould-Ahmed-Vall , Menachem Adelman
摘要: Embodiments detailed herein relate to systems and methods to load a tile register pair. In one example, a processor includes: decode circuitry to decode a load matrix pair instruction having fields for an opcode and source and destination identifiers to identify source and destination matrices, respectively, each matrix having a PAIR parameter equal to TRUE; and execution circuitry to execute the decoded load matrix pair instruction to load every element of left and right tiles of the identified destination matrix from corresponding element positions of left and right tiles of the identified source matrix, respectively, wherein the executing operates on one row of the identified destination matrix at a time, starting with the first row.
-
公开(公告)号:US10853554B2
公开(公告)日:2020-12-01
申请号:US16456825
申请日:2019-06-28
申请人: Intel Corporation
发明人: Javier Sebastian Turek , Javier Felip Leon , Alexander Heinecke , Evangelos Georganas , Luis Carlos Maria Remis , Ignacio Javier Alvarez , David Israel Gonzalez Aguirre , Shengtian Zhou , Justin Gottschlich
IPC分类号: G06F30/30 , G06F30/398 , G06N3/04 , G06N3/08
摘要: Systems and methods for determining a configuration for a microarchitecture are described herein. An example system includes a proposal generator to generate a first candidate configuration of parameters for the microarchitecture, a machine learning model to process the first candidate configuration of parameters to output estimated performance indicators for the microarchitecture, an uncertainty checker to determine whether the estimated performance indicators are reliable, and a performance checker. In response to a determination that the estimated performance indicators are reliable, the performance checker is to determine whether the estimated performance indicators have improved toward a target. Further, if the estimated performance indicators have improved, the performance checker is to store the first candidate configuration of parameters in a memory as a potential solution for a microarchitecture without performing a full simulation on the first candidate configuration of parameters.
-
公开(公告)号:US20200026745A1
公开(公告)日:2020-01-23
申请号:US16586114
申请日:2019-09-27
申请人: Intel Corporation
摘要: Systems, methods, and apparatuses relating to a matrix operations accelerator are described. In one embodiment, a processor includes a matrix operations accelerator circuit that includes a two-dimensional grid of fused multiply accumulate circuits that is switchable from a first mode where a respective output of each of a first proper subset of fused multiply accumulate circuits of the two-dimensional grid is transmitted downstream to a respective input of each of a second proper subset of fused multiply accumulate circuits of the two-dimensional grid to form output values from at least one first input two-dimensional matrix and at least one second input two-dimensional matrix, and store the output values in resultant storage, to a second mode where the respective output of each of the first proper subset of fused multiply accumulate circuits of the two-dimensional grid form first output values from a first subset of the at least one first input two-dimensional matrix and the at least one second input two-dimensional matrix, and store the first output values in the resultant storage, and a respective output of each of the second proper subset of fused multiply accumulate circuits of the two-dimensional grid form second output values from a second subset of the at least one first input two-dimensional matrix and the at least one second input two-dimensional matrix, and store the second output values in the resultant storage.
-
10.
公开(公告)号:US20190324727A1
公开(公告)日:2019-10-24
申请号:US16455358
申请日:2019-06-27
申请人: Intel Corporation
发明人: Marcos Carranza , Mats Agerstam , Justin Gottschlich , Alexander Heinecke , Cesar Martinez-Spessot , Maria Ramirez Loaiza , Mohammad Mejbah Ul Alam , Shengtian Zhou
摘要: Methods, apparatus, systems and articles of manufacture are disclosed for code review assistance for dynamically typed languages. An example apparatus to analyze a segment of code includes a function identifier to identify a first input of a first function call included in the segment of the code, a parameter type vector (PTV) estimatior model to estimate a first data structure based on the first input, the PTV estimatior model generated via a set of reviewed code, a PTV determiner to generate a second data structure based on a data parameter type of the first input, an error comparator to determine a first reconstruction error based on the first data structure, and the second data structure and a recommendation generator to, if the first reconstruction error does not satisfy a recommendation threshold, generate a first recommendation to review the first function call.
-
-
-
-
-
-
-
-
-