-
11.
公开(公告)号:US20220100502A1
公开(公告)日:2022-03-31
申请号:US17134008
申请日:2020-12-24
Applicant: Intel Corporation
Inventor: ALEXANDER F. HEINECKE , ROBERT VALENTINE , MARK J. CHARNEY , MENACHEM ADELMAN , CHRISTOPHER J. HUGHES , EVANGELOS GEORGANAS , ZEEV SPERBER , AMIT GRADSTEIN , SIMON RUBANOVICH
Abstract: Systems, methods, and apparatuses relating to 16-bit floating-point matrix dot product instructions are described. In one embodiment, a processor includes fetch circuitry to fetch a single instruction having fields to specify an opcode and locations of a M by N destination matrix having single-precision elements, an M by K first source matrix, and a K by N second source matrix, the source matrices having elements that each comprise a pair of half-precision floating-point values, the opcode to indicate execution circuitry is to cause, for each element of the first source matrix and corresponding element of the second source matrix, a conversion of the half-precision floating-point values to single-precision values, a multiplication of converted single-precision values from first values of the pairs together to generate a first result, a multiplication of converted single-precision values from second values of the pairs together to generate a second result, and an accumulation of the first result and the second result with previous contents of a corresponding element of the destination matrix, decode circuitry to decode the fetched instruction, and the execution circuitry to respond to the decoded instruction as specified by the opcode.
-
公开(公告)号:US20180081689A1
公开(公告)日:2018-03-22
申请号:US15809818
申请日:2017-11-10
Applicant: Intel Corporation
Inventor: ELMOUSTAPHA OULD-AHMED-VALL , ROBERT VALENTINE , JESUS CORBAL , BRET L. TOLL , MARK J. CHARNEY , ZEEV SPERBER , AMIT GRADSTEIN
IPC: G06F9/30
Abstract: An apparatus is described that includes instruction execution circuitry to execute first, second, third, and fourth instructions, the first and second instructions select a first group of input vector elements from one of multiple first non-overlapping sections of respective first and second input vectors. Each of the multiple first non-overlapping sections have a same bit width as the first group. Both the third and fourth instructions select a second group of input vector elements from one of multiple second non-overlapping sections of respective third and fourth input vectors. The second group has a second bit width that is larger than the first bit width. Each of multiple second non-overlapping sections have a same bit width as the second group. The apparatus includes masking layer circuitry to mask the first and second groups at a first granularity and second granularity.
-