-
公开(公告)号:US20200310756A1
公开(公告)日:2020-10-01
申请号:US16370966
申请日:2019-03-30
Applicant: Intel Corporation
Inventor: Simon RUBANOVICH , Amit GRADSTEIN , Zeev SPERBER , Mrinmay DUTTA
Abstract: Disclosed embodiments relate to performing floating-point addition with selected rounding. In one example, a processor includes circuitry to decode and execute an instruction specifying locations of first and second floating-point (FP) sources, and an opcode indicating the processor is to: bring the FP sources into alignment by shifting a mantissa of the smaller source FP operand to the right by a difference between their exponents, generating rounding controls based on any bits that escape; simultaneously generate a sum of the FP sources and of the FP sources plus one, the sums having a fuzzy-Jbit format having an additional Jbit into which a carry-out, if any, select one of the sums based on the rounding controls, and generate a result comprising a mantissa-wide number of most-significant bits of the selected sum, starting with the most significant non-zero Jbit.
-
公开(公告)号:US20200026515A1
公开(公告)日:2020-01-23
申请号:US16338324
申请日:2016-10-20
Applicant: Intel Corporation
Inventor: Robert Valentine , Galina RYVCHIN , Piotr MAJCHER , Mark J. CHARNEY , Elmoustapha OULD-AHMED-VALL , Jesus CORBAL , Milind B. GIRKAR , Zeev SPERBER , Simon RUBANOVICH , Amit GRADSTEIN
Abstract: In some embodiments, packed data elements of first and second packed data source operands are of a first, different size than a second size of packed data elements of a third packed data operand. Execution circuitry executes decoded single instruction to perform, for each packed data element position of a destination operand, a multiplication of a M N-sized packed data elements from the first and second packed data sources that correspond to a packed data element position of the third packed data source, add of results from these multiplications to a full-sized packed data element of a packed data element position of the third packed data source, and storage of the addition result in a packed data element position destination corresponding to the packed data element position of the third packed data source, wherein M is equal to the full-sized packed data element divided by N.
-
33.
公开(公告)号:US20190079762A1
公开(公告)日:2019-03-14
申请号:US16186384
申请日:2018-11-09
Applicant: Intel Corporation
Inventor: Alexander F. HEINECKE , Robert VALENTINE , Mark J. CHARNEY , Raanan SADE , Menachem ADELMAN , Zeev SPERBER , Amit GRADSTEIN , Simon RUBANOVICH
Abstract: Disclosed embodiments relate to systems and methods for performing instructions to convert to 16-bit floating-point format. In one example, a processor includes fetch circuitry to fetch an instruction having fields to specify an opcode and locations of a first source vector comprising N single-precision elements, and a destination vector comprising at least N 16-bit floating-point elements, the opcode to indicate execution circuitry is to convert each of the elements of the specified source vector to 16-bit floating-point, the conversion to include truncation and rounding, as necessary, and to store each converted element into a corresponding location of the specified destination vector, decode circuitry to decode the fetched instruction, and execution circuitry to respond to the decoded instruction as specified by the opcode.
-
公开(公告)号:US20190042202A1
公开(公告)日:2019-02-07
申请号:US16144889
申请日:2018-09-27
Applicant: Intel Corporation
Inventor: Raanan SADE , Robert VALENTINE , Mark J. CHARNEY , Simon RUBANOVICH , Amit GRADSTEIN , Zeev SPERBER , Bret TOLL , Jesus CORBAL , Christopher J. HUGHES , Alexander F. HEINECKE , Elmoustapha OULD-AHMED-VALL
IPC: G06F7/78 , G06F9/30 , G06F9/38 , G06F15/173
Abstract: Disclosed embodiments relate to systems and methods for performing instructions to transpose rectangular tiles. In one example, a processor includes fetch circuitry to fetch an instruction having fields to specify an opcode and locations of first destination, second destination, first source, and second source matrices, the specified opcode to cause the processor to process each of the specified source and destination matrices as a rectangular matrix, decode circuitry to decode the fetched rectangular matrix transpose instruction, and execution circuitry to respond to the decoded rectangular matrix transpose instruction by transposing each row of elements of the specified first source matrix into a corresponding column of the specified first destination matrix and transposing each row of elements of the specified second source matrix into a corresponding column of the specified second destination matrix.
-
-
-