Systems and methods for performing 16-bit floating-point matrix dot product instructions

    公开(公告)号:US11614936B2

    公开(公告)日:2023-03-28

    申请号:US17216566

    申请日:2021-03-29

    申请人: Intel Corporation

    IPC分类号: G06F9/30 G06F9/38

    摘要: Disclosed embodiments relate to computing dot products of nibbles in tile operands. In one example, a processor includes decode circuitry to decode a tile dot product instruction having fields for an opcode, a destination identifier to identify a M by N destination matrix, a first source identifier to identify a M by K first source matrix, and a second source identifier to identify a K by N second source matrix, each of the matrices containing doubleword elements, and execution circuitry to execute the decoded instruction to perform a flow K times for each element (m, n) of the specified destination matrix to generate eight products by multiplying each nibble of a doubleword element (M,K) of the specified first source matrix by a corresponding nibble of a doubleword element (K,N) of the specified second source matrix, and to accumulate and saturate the eight products with previous contents of the doubleword element.

    Systems and methods for implementing chained tile operations

    公开(公告)号:US11416260B2

    公开(公告)日:2022-08-16

    申请号:US16863951

    申请日:2020-04-30

    申请人: Intel Corporation

    摘要: Disclosed embodiments relate to systems and methods for implementing chained tile operations. In one example, a processor includes fetch circuitry to fetch one or more instructions until a plurality of instructions has been fetched, each instruction to specify source and destination tile operands, decode circuitry to decode the fetched instructions, and execution circuitry, responsive to the decoded instructions, to: identify first and second decoded instructions belonging to a chain of instructions, dynamically select and configure a SIMD path comprising first and second processing engines (PE) to execute the first and second decoded instructions, and set aside the specified destination of the first decoded instruction, and instead route a result of the first decoded instruction from the first PE to be used by the second PE to perform the second decoded instruction.

    Apparatus and method of improved insert instructions

    公开(公告)号:US11354124B2

    公开(公告)日:2022-06-07

    申请号:US15668508

    申请日:2017-08-03

    申请人: Intel Corporation

    IPC分类号: G06F9/30 G06F12/06 G06F9/38

    摘要: An apparatus is described having instruction execution logic circuitry to execute first, second, third and fourth instruction. Both the first instruction and the second instruction insert a first group of input vector elements to one of multiple first non overlapping sections of respective first and second resultant vectors. The first group has a first bit width. Each of the multiple first non overlapping sections have a same bit width as the first group. Both the third instruction and the fourth instruction insert a second group of input vector elements to one of multiple second non overlapping sections of respective third and fourth resultant vectors. The second group has a second bit width that is larger than said first bit width. Each of the multiple second non overlapping sections have a same bit width as the second group. The apparatus also includes masking layer circuitry to mask the first and third instructions at a first resultant vector granularity, and, mask the second and fourth instructions at a second resultant vector granularity.

    APPARATUS AND METHOD FOR COMPLEX MULTIPLICATION

    公开(公告)号:US20220129264A1

    公开(公告)日:2022-04-28

    申请号:US17517351

    申请日:2021-11-02

    申请人: Intel Corporation

    IPC分类号: G06F9/30 G06F7/48

    摘要: An embodiment of the invention is a processor including execution circuitry to calculate, in response to a decoded instruction, a result of a complex multiplication of a first complex number and a second complex number. The calculation includes a first operation to calculate a first term of a real component of the result and a first term of the imaginary component of the result. The calculation also includes a second operation to calculate a second term of the real component of the result and a second term of the imaginary component of the result. The processor also includes a decoder, a first source register, and a second source register. The decoder is to decode an instruction to generate the decoded instruction. The first source register is to provide the first complex number and the second source register is to provide the second complex number.

    Context save with variable save state size

    公开(公告)号:US11275588B2

    公开(公告)日:2022-03-15

    申请号:US16624178

    申请日:2017-07-01

    申请人: Intel Corporation

    IPC分类号: G06F9/30

    摘要: Embodiments of an apparatus comprising a decoder to decode an instruction having fields for an opcode and a destination operand and execution circuitry to execute the decoded instruction to perform a save of processor state components to an area located at a destination memory address specified by the destination operand, wherein a size of the area is defined by at least one indication of an execution of an instruction operating on a specified group of processor states are described.

    Apparatus and method for vector horizontal add of signed/unsigned words and doublewords

    公开(公告)号:US11249754B2

    公开(公告)日:2022-02-15

    申请号:US15850131

    申请日:2017-12-21

    申请人: Intel Corporation

    IPC分类号: G06F9/30 G06F7/505

    摘要: An apparatus and method for performing a packed horizontal addition of words and doublewords. One embodiment of a processor includes a decoder to decode a packed horizontal add instruction which includes an opcode and one or more operands used to identify a plurality of packed words; a source register to store a plurality of packed words; execution circuitry to execute the decoded instruction, and a destination register to store a final result as a packed result word in a designated data element position. The execution circuitry includes operand selection circuitry to identify first and second packed words from the source register in accordance with the operands and opcode; adder circuitry to add the two packed words to generate a temporary sum; a temporary storage of at least 17 bits to store the temporary sum; and saturation circuitry to saturate the temporary sum if necessary to generate the final result.

    Instructions for vector multiplication of unsigned words with rounding

    公开(公告)号:US11221849B2

    公开(公告)日:2022-01-11

    申请号:US16642778

    申请日:2017-09-27

    申请人: Intel Corporation

    IPC分类号: G06F9/22 G06F9/30 G06F9/38

    摘要: Disclosed embodiments relate to executing a vector multiplication instruction. In one example, a processor includes fetch circuitry to fetch the vector multiplication instruction having fields for an opcode, first and second source identifiers, and a destination identifier, decode circuitry to decode the fetched instruction, execution circuitry to, on each of a plurality of corresponding pairs of fixed-sized elements of the identified first and second sources, execute the decoded instruction to generate a double-sized product of each pair of fixed-sized elements, the double-sized product being represented by at least twice a number of bits of the fixed size, and generate an unsigned fixed-sized result by rounding the most significant fixed-sized portion of the double-sized product to fit into the identified destination.