-
公开(公告)号:US20240004662A1
公开(公告)日:2024-01-04
申请号:US17856978
申请日:2022-07-02
Applicant: Intel Corporation
Inventor: Menachem ADELMAN , Amit GRADSTEIN , Regev SHEMY , Chitra NATARAJAN , Leonardo BORGES , Chytra SHIVASWAMY , Igor ERMOLAEV , Michael ESPIG , Or BEIT AHARON , Jeff WIEDEMEIER
IPC: G06F9/30
CPC classification number: G06F9/30185 , G06F9/30025 , G06F9/30021
Abstract: Techniques for performing horizontal reductions are described. In some examples, an instance of a horizontal instruction is to include at least one field for an opcode, one or more fields to reference a first source operand, and one or more fields to reference a destination operand, wherein the opcode is to indicate that execution circuitry is, in response to a decoded instance of the single instruction, to at least perform a horizontal reduction using at least one data element of a non-masked data element position of at least the first source operand and store a result of the horizontal reduction in the destination operand.
-
22.
公开(公告)号:US20220414182A1
公开(公告)日:2022-12-29
申请号:US17359519
申请日:2021-06-26
Applicant: Intel Corporation
Inventor: Menachem ADELMAN , Robert VALENTINE , Zeev SPERBER , Amit GRADSTEIN , Simon RUBANOVICH , Sagi MELLER , Christopher HUGHES , Evangelos GEORGANAS , Alexander HEINECKE , Mark CHARNEY
Abstract: Techniques for matrix multiplication are described. In some examples, decode circuitry is to decode a single instruction having fields for an opcode, an indication of a location of a first source operand, an indication of a location of a second source operand, and an indication of a location of a destination operand, wherein the opcode is to indicate that execution circuitry is to at least convert data elements of the first and second source operands from a first floating point representation to a second floating point representation, perform matrix multiplication with the converted data elements, and accumulate results of the matrix multiplication in the destination operand in the first floating point representation; and the execution circuitry is to execute to the decoded instruction as specified by the opcode.
-
23.
公开(公告)号:US20220326948A1
公开(公告)日:2022-10-13
申请号:US17851468
申请日:2022-06-28
Applicant: Intel Corporation
Inventor: Alexander F. HEINECKE , Robert VALENTINE , Mark J. CHARNEY , Raanan SADE , Menachem ADELMAN , Zeev SPERBER , Amit GRADSTEIN , Simon RUBANOVICH
Abstract: Disclosed embodiments relate to systems and methods for performing instructions to convert to 16-bit floating-point format. In one example, a processor includes fetch circuitry to fetch an instruction having fields to specify an opcode and locations of a first source vector comprising N single-precision elements, and a destination vector comprising at least N 16-bit floating-point elements, the opcode to indicate execution circuitry is to convert each of the elements of the specified source vector to 16-bit floating-point, the conversion to include truncation and rounding, as necessary, and to store each converted element into a corresponding location of the specified destination vector, decode circuitry to decode the fetched instruction, and execution circuitry to respond to the decoded instruction as specified by the opcode.
-
24.
公开(公告)号:US20210286620A1
公开(公告)日:2021-09-16
申请号:US17216566
申请日:2021-03-29
Applicant: Intel Corporation
Inventor: Alexander F. HEINECKE , Robert VALENTINE , Mark J. CHARNEY , Raanan SADE , Menachem ADELMAN , Zeev SPERBER , Amit GRADSTEIN , Simon RUBANOVICH
Abstract: Disclosed embodiments relate to computing dot products of nibbles in tile operands. In one example, a processor includes decode circuitry to decode a tile dot product instruction having fields for an opcode, a destination identifier to identify a M by N destination matrix, a first source identifier to identify a M by K first source matrix, and a second source identifier to identify a K by N second source matrix, each of the matrices containing doubleword elements, and execution circuitry to execute the decoded instruction to perform a flow K times for each element (m, n) of the specified destination matrix to generate eight products by multiplying each nibble of a doubleword element (M,K) of the specified first source matrix by a corresponding nibble of a doubleword element (K,N) of the specified second source matrix, and to accumulate and saturate the eight products with previous contents of the doubleword element.
-
公开(公告)号:US20200310793A1
公开(公告)日:2020-10-01
申请号:US16369743
申请日:2019-03-29
Applicant: Intel Corporation
Inventor: Simon RUBANOVICH , Amit GRADSTEIN , Zeev SPERBER
IPC: G06F9/30
Abstract: Disclosed embodiments relate to an interleaved pipeline of floating-point (FP) adders. In one example, a processor is to execute an instruction specifying an opcode and locations of a M by K first source matrix, a K by N second source matrix, and a M by N destination matrix, the opcode indicating execution circuitry, for each FP element (M, N) of the destination matrix, is to: launch K instances of a pipeline having a first, MULTIPLY stage, during which a FP element (M, K) of the first source matrix and a corresponding FP element (K, N) of the second source matrix are multiplied; concurrently, in an EXPDIFF stage, determine an exponent difference between the product and a previous FP value of the element (M, N) of the destination matrix; and in a second, ADD-BYPASS stage, accumulate the product with the previous FP value and, concurrently, bypassing the accumulated sum to a subsequent pipeline instance.
-
公开(公告)号:US20200310757A1
公开(公告)日:2020-10-01
申请号:US16369629
申请日:2019-03-29
Applicant: Intel Corporation
Inventor: Amit GRADSTEIN , Simon RUBANOVICH , Zeev SPERBER
Abstract: Disclosed embodiments relate to performing floating-point (FP) arithmetic. In one example, a processor is to decode an instruction specifying locations of first, second, and third floating-point (FP) operands and an opcode calling for accumulating a FP product of the first and second FP operands with the third FP operand, and execution circuitry to, in a first cycle, generate the FP product having a Fuzzy-Jbit format comprising a sign bit, a 9-bit exponent, and a 25-bit mantissa having two possible positions for a JBit and, in a second cycle, to accumulate the FP product with the third FP operand, while concurrently, based on Jbit positions of the FP product and the third FP operand, determining an exponent adjustment and a mantissa shift control of a result of the accumulation, wherein performing the exponent adjustment concurrently enhances an ability to perform the accumulation in one cycle.
-
27.
公开(公告)号:US20190303142A1
公开(公告)日:2019-10-03
申请号:US15941531
申请日:2018-03-30
Applicant: Intel Corporation
Inventor: Raanan SADE , Thierry PONS , Amit GRADSTEIN , Zeev SPERBER , Mark J. CHARNEY , Robert VALENTINE , Eyal Oz-Sinay
Abstract: Disclosed embodiments relate to efficient complex vector multiplication. In one example, an apparatus includes execution circuitry, responsive to an instruction having fields to specify multiplier, multiplicand, and summand complex vectors, to perform two operations: first, to generate a double-even multiplicand by duplicating even elements of the specified multiplicand, and to generate a temporary vector using a fused multiply-add (FMA) circuit having A, B, and C inputs set to the specified multiplier, the double-even multiplicand, and the specified summand, respectively, and second, to generate a double-odd multiplicand by duplicating odd elements of the specified multiplicand, to generate a swapped multiplier by swapping even and odd elements of the specified multiplier, and to generate a result using a second FMA circuit having its even product negated, and having A, B, and C inputs set to the swapped multiplier, the double-odd multiplicand, and the temporary vector, respectively.
-
公开(公告)号:US20190286441A1
公开(公告)日:2019-09-19
申请号:US16271675
申请日:2019-02-08
Applicant: INTEL CORPORATION
Inventor: Seth ABRAHAM , Elmoustapha OULD-AHMED-VALL , Robert VALENTINE , Zeev SPERBER , Amit GRADSTEIN
Abstract: A method of an aspect includes receiving an instruction. The instruction indicates an integer stride, indicates an integer offset, and indicates a destination storage location. A result is stored in the destination storage location in response to the instruction. The result includes a sequence of at least four integers in numerical order with a smallest one of the at least four integers differing from zero by the integer offset and with all integers of the sequence in consecutive positions differing by the integer stride. Other methods, apparatus, systems, and instructions are disclosed.
-
29.
公开(公告)号:US20240241722A1
公开(公告)日:2024-07-18
申请号:US18619570
申请日:2024-03-28
Applicant: Intel Corporation
Inventor: Naveen MELLEMPUDI , Alexander F. HEINECKE , Robert VALENTINE , Mark J. CHARNEY , Christopher J. HUGHES , Evangelos GEORGANAS , Zeev SPERBER , Amit GRADSTEIN , Simon RUBANOVICH
CPC classification number: G06F9/30036 , G06F7/49915 , G06F9/30196 , G06F9/3887
Abstract: Systems, methods, and apparatuses relating to 8-bit floating-point matrix dot product instructions are described. A processor embodiment includes fetch circuitry to fetch an instruction having fields to specify an opcode and locations of a destination matrix having single-precision elements, a first source matrix, and a second source matrix, the source matrices having elements that each comprise a quadruple of 8-bit floating-point values, the opcode to indicate execution circuitry is to cause, for each element of the first source matrix and corresponding element of the second source matrix, a conversion of the 8-bit floating-point values to single-precision values, a multiplication of different pairs of converted single-precision values to generate plurality of results, and an accumulation of the results with previous contents of a corresponding element of the destination matrix, decode circuitry to decode the fetched instruction, and the execution circuitry to respond to the decoded instruction as specified by the opcode.
-
30.
公开(公告)号:US20240126545A1
公开(公告)日:2024-04-18
申请号:US18397664
申请日:2023-12-27
Applicant: Intel Corporation
Inventor: Alexander F. HEINECKE , Robert VALENTINE , Mark J. CHARNEY , Raanan SADE , Menachem ADELMAN , Zeev SPERBER , Amit GRADSTEIN , Simon RUBANOVICH
CPC classification number: G06F9/30036 , G06F9/3001 , G06F9/3016 , G06F9/3802
Abstract: Disclosed embodiments relate to computing dot products of nibbles in tile operands. In one example, a processor includes decode circuitry to decode a tile dot product instruction having fields for an opcode, a destination identifier to identify a M by N destination matrix, a first source identifier to identify a M by K first source matrix, and a second source identifier to identify a K by N second source matrix, each of the matrices containing doubleword elements, and execution circuitry to execute the decoded instruction to perform a flow K times for each element (m, n) of the specified destination matrix to generate eight products by multiplying each nibble of a doubleword element (M,K) of the specified first source matrix by a corresponding nibble of a doubleword element (K,N) of the specified second source matrix, and to accumulate and saturate the eight products with previous contents of the doubleword element.
-
-
-
-
-
-
-
-
-