Systems and methods for performing 16-bit floating-point matrix dot product instructions

    公开(公告)号:US11614936B2

    公开(公告)日:2023-03-28

    申请号:US17216566

    申请日:2021-03-29

    申请人: Intel Corporation

    IPC分类号: G06F9/30 G06F9/38

    摘要: Disclosed embodiments relate to computing dot products of nibbles in tile operands. In one example, a processor includes decode circuitry to decode a tile dot product instruction having fields for an opcode, a destination identifier to identify a M by N destination matrix, a first source identifier to identify a M by K first source matrix, and a second source identifier to identify a K by N second source matrix, each of the matrices containing doubleword elements, and execution circuitry to execute the decoded instruction to perform a flow K times for each element (m, n) of the specified destination matrix to generate eight products by multiplying each nibble of a doubleword element (M,K) of the specified first source matrix by a corresponding nibble of a doubleword element (K,N) of the specified second source matrix, and to accumulate and saturate the eight products with previous contents of the doubleword element.

    Apparatus and method of improved insert instructions

    公开(公告)号:US11354124B2

    公开(公告)日:2022-06-07

    申请号:US15668508

    申请日:2017-08-03

    申请人: Intel Corporation

    IPC分类号: G06F9/30 G06F12/06 G06F9/38

    摘要: An apparatus is described having instruction execution logic circuitry to execute first, second, third and fourth instruction. Both the first instruction and the second instruction insert a first group of input vector elements to one of multiple first non overlapping sections of respective first and second resultant vectors. The first group has a first bit width. Each of the multiple first non overlapping sections have a same bit width as the first group. Both the third instruction and the fourth instruction insert a second group of input vector elements to one of multiple second non overlapping sections of respective third and fourth resultant vectors. The second group has a second bit width that is larger than said first bit width. Each of the multiple second non overlapping sections have a same bit width as the second group. The apparatus also includes masking layer circuitry to mask the first and third instructions at a first resultant vector granularity, and, mask the second and fourth instructions at a second resultant vector granularity.

    Context save with variable save state size

    公开(公告)号:US11275588B2

    公开(公告)日:2022-03-15

    申请号:US16624178

    申请日:2017-07-01

    申请人: Intel Corporation

    IPC分类号: G06F9/30

    摘要: Embodiments of an apparatus comprising a decoder to decode an instruction having fields for an opcode and a destination operand and execution circuitry to execute the decoded instruction to perform a save of processor state components to an area located at a destination memory address specified by the destination operand, wherein a size of the area is defined by at least one indication of an execution of an instruction operating on a specified group of processor states are described.