SYSTEMS AND METHODS FOR PERFORMING 16-BIT FLOATING-POINT MATRIX DOT PRODUCT INSTRUCTIONS

    公开(公告)号:US20210286620A1

    公开(公告)日:2021-09-16

    申请号:US17216566

    申请日:2021-03-29

    Abstract: Disclosed embodiments relate to computing dot products of nibbles in tile operands. In one example, a processor includes decode circuitry to decode a tile dot product instruction having fields for an opcode, a destination identifier to identify a M by N destination matrix, a first source identifier to identify a M by K first source matrix, and a second source identifier to identify a K by N second source matrix, each of the matrices containing doubleword elements, and execution circuitry to execute the decoded instruction to perform a flow K times for each element (m, n) of the specified destination matrix to generate eight products by multiplying each nibble of a doubleword element (M,K) of the specified first source matrix by a corresponding nibble of a doubleword element (K,N) of the specified second source matrix, and to accumulate and saturate the eight products with previous contents of the doubleword element.

    INTERLEAVED PIPELINE OF FLOATING-POINT ADDERS

    公开(公告)号:US20200310793A1

    公开(公告)日:2020-10-01

    申请号:US16369743

    申请日:2019-03-29

    Abstract: Disclosed embodiments relate to an interleaved pipeline of floating-point (FP) adders. In one example, a processor is to execute an instruction specifying an opcode and locations of a M by K first source matrix, a K by N second source matrix, and a M by N destination matrix, the opcode indicating execution circuitry, for each FP element (M, N) of the destination matrix, is to: launch K instances of a pipeline having a first, MULTIPLY stage, during which a FP element (M, K) of the first source matrix and a corresponding FP element (K, N) of the second source matrix are multiplied; concurrently, in an EXPDIFF stage, determine an exponent difference between the product and a previous FP value of the element (M, N) of the destination matrix; and in a second, ADD-BYPASS stage, accumulate the product with the previous FP value and, concurrently, bypassing the accumulated sum to a subsequent pipeline instance.

    USING FUZZY-JBIT LOCATION OF FLOATING-POINT MULTIPLY-ACCUMULATE RESULTS

    公开(公告)号:US20200310757A1

    公开(公告)日:2020-10-01

    申请号:US16369629

    申请日:2019-03-29

    Abstract: Disclosed embodiments relate to performing floating-point (FP) arithmetic. In one example, a processor is to decode an instruction specifying locations of first, second, and third floating-point (FP) operands and an opcode calling for accumulating a FP product of the first and second FP operands with the third FP operand, and execution circuitry to, in a first cycle, generate the FP product having a Fuzzy-Jbit format comprising a sign bit, a 9-bit exponent, and a 25-bit mantissa having two possible positions for a JBit and, in a second cycle, to accumulate the FP product with the third FP operand, while concurrently, based on Jbit positions of the FP product and the third FP operand, determining an exponent adjustment and a mantissa shift control of a result of the accumulation, wherein performing the exponent adjustment concurrently enhances an ability to perform the accumulation in one cycle.

Patent Agency Ranking