-
公开(公告)号:US20190243684A1
公开(公告)日:2019-08-08
申请号:US15890984
申请日:2018-02-07
Applicant: Intel Corporation
Inventor: Pooja Roy , Jayesh Gaur , Sreenivas Subramoney , Zeev Sperber , Alexandr Titov , Lihu Rappoport , Stanislav Shwartsman , Hong Wang , Adi Yoaz , Ronak Singhal , Robert S. Chappell
Abstract: A processor including an execution unit, an instruction scheduler circuit to identify a first instruction of an instruction stream, identify a second instruction on which execution of the first instruction depends, and assign a first dispatch priority value to the first instruction and the second instruction, and a dispatch circuit to dispatch, based on the first dispatch priority value, the first instruction and the second instruction to an instruction execution circuit.
-
公开(公告)号:US10324718B2
公开(公告)日:2019-06-18
申请号:US15864158
申请日:2018-01-08
Applicant: Intel Corporation
Inventor: Elmoustapha Ould-Ahmed-Vall , Robert Valentine , Jesus Corbal San Andrian , Suleyman Sair , Bret L. Toll , Zeev Sperber , Amit Gradstein , Asaf Rubinstein
IPC: G06F9/30
Abstract: A method of an aspect includes receiving a masked packed rotate instruction. The instruction indicates a first source packed data including a plurality of packed data elements, a packed data operation mask having a plurality of mask elements, at least one rotation amount, and a destination storage location. A result packed data is stored in the destination storage location in response to the instruction. The result packed data includes result data elements that each correspond to a different one of the mask elements in a corresponding relative position. Result data elements that are not masked out by the corresponding mask element include one of the data elements of the first source packed data in a corresponding position that has been rotated. Result data elements that are masked out by the corresponding mask element include a masked out value. Other methods, apparatus, systems, and instructions are disclosed.
-
公开(公告)号:US20190171515A1
公开(公告)日:2019-06-06
申请号:US15831195
申请日:2017-12-04
Applicant: Intel Corporation
Inventor: Zeev Sperber , Stanislav Shwartsman , Jared W. Stark, IV , Lihu Rappoport , Igor Yanover , George Leifman
CPC classification number: G06F11/0793 , G06F9/30043 , G06F9/30058 , G06F9/3802 , G06F9/3855 , G06F11/0721 , G06F12/0215 , G06F12/0253 , G06F2212/654 , G06F2212/702
Abstract: A method for handling load faults in an out-of-order processor is described. The method includes detecting, by a memory ordering buffer of the out-of-order processor, a load fault corresponding to a load instruction that was executed out-of-order by the out-of-order processor; determining, by the memory ordering buffer, whether instant reclamation is available for resolving the load fault of the load instruction; and performing, in response to determining that instant reclamation is available for resolving the load fault of the load instruction, instant reclamation to re-fetch the load instruction for execution prior to attempting to retire the load instruction.
-
公开(公告)号:US10223112B2
公开(公告)日:2019-03-05
申请号:US15721799
申请日:2017-09-30
Applicant: Intel Corporation
Inventor: Seth Abraham , Elmoustapha Ould-Ahmed-Vall , Robert Valentine , Zeev Sperber , Amit Gradstein
Abstract: A method of an aspect includes receiving an instruction. The instruction indicates an integer stride, indicates an integer offset, and indicates a destination storage location. A result is stored in the destination storage location in response to the instruction. The result includes a sequence of at least four integers in numerical order with a smallest one of the at least four integers differing from zero by the integer offset and with all integers of the sequence in consecutive positions differing by the integer stride. Other methods, apparatus, systems, and instructions are disclosed.
-
公开(公告)号:US20190042255A1
公开(公告)日:2019-02-07
申请号:US15858937
申请日:2017-12-29
Applicant: Intel Corporation
Inventor: Raanan Sade , Simon Rubanovich , Amit Gradstein , Zeev Sperber , Alexander Heinecke , Robert Valentine , Mark J. Charney , Bret Toll , Jesus Corbal , Elmoustapha Ould-Ahmed-Vall , Menachem Adelman
IPC: G06F9/30
Abstract: Embodiments detailed herein relate to systems and methods to store a tile register pair to memory. In one example, a processor includes: decode circuitry to decode a store matrix pair instruction having fields for an opcode and source and destination identifiers to identify source and destination matrices, respectively, each matrix having a PAIR parameter equal to TRUE; and execution circuitry to execute the decoded store matrix pair instruction to store every element of left and right tiles of the identified source matrix to corresponding element positions of left and right tiles of the identified destination matrix, respectively, wherein the executing stores a chunk of C elements of one row of the identified source matrix at a time.
-
公开(公告)号:US20190042254A1
公开(公告)日:2019-02-07
申请号:US15858932
申请日:2017-12-29
Applicant: Intel Corporation
Inventor: Raanan Sade , Simon Rubanovich , Amit Gradstein , Zeev Sperber , Alexander Heinecke , Robert Valentine , Mark J. Charney , Bret Toll , Jesus Corbal , Elmoustapha Ould-Ahmed-Vall , Menachem Adelman
IPC: G06F9/30
Abstract: Embodiments detailed herein relate to systems and methods to load a tile register pair. In one example, a processor includes: decode circuitry to decode a load matrix pair instruction having fields for an opcode and source and destination identifiers to identify source and destination matrices, respectively, each matrix having a PAIR parameter equal to TRUE; and execution circuitry to execute the decoded load matrix pair instruction to load every element of left and right tiles of the identified destination matrix from corresponding element positions of left and right tiles of the identified source matrix, respectively, wherein the executing operates on one row of the identified destination matrix at a time, starting with the first row.
-
公开(公告)号:US20190042235A1
公开(公告)日:2019-02-07
申请号:US15858916
申请日:2017-12-29
Applicant: Intel Corporation
Inventor: Raanan Sade , Simon Rubanovich , Amit Gradstein , Zeev Sperber , Alexander Heinecke , Robert Valentine , Mark J. Charney , Bret Toll , Jesus Corbal , Elmoustapha Ould-Ahmed-Vall , Menachem Adelman
IPC: G06F9/30
Abstract: Disclosed embodiments relate to computing dot products of nibbles in tile operands. In one example, a processor includes decode circuitry to decode a tile dot product instruction having fields for an opcode, a destination identifier to identify a M by N destination matrix, a first source identifier to identify a M by K first source matrix, and a second source identifier to identify a K by N second source matrix, each of the matrices containing doubleword elements, and execution circuitry to execute the decoded instruction to perform a flow K times for each element (M,N) of the identified destination matrix to generate eight products by multiplying each nibble of a doubleword element (M,K) of the identified first source matrix by a corresponding nibble of a doubleword element (K,N) of the identified second source matrix, and to accumulate and saturate the eight products with previous contents of the doubleword element (M,N).
-
公开(公告)号:US10156884B2
公开(公告)日:2018-12-18
申请号:US15354018
申请日:2016-11-17
Applicant: INTEL CORPORATION
Inventor: Michael Mishaeli , Ron Gabor , Robert C. Valentine , Alex Gerber , Zeev Sperber
Abstract: Technologies for local power gate (LPG) interfaces for power-aware operations are described. A system on chip (SoC) includes a first functional unit, a second functional unit, and local power gate (LPG) hardware coupled to the first functional unit and the second functional unit. The LPG hardware is to power gate the first functional unit according to local power states of the LPG hardware. The second functional unit decodes a first instruction to perform a first power-aware operation of a specified length, including computing an execution code path for execution. The second functional unit monitors a current local power state of the LPG hardware, selects a code path based on the current local power state, the specified length, and a specified threshold, and issues a hint to the LPG hardware to power up the first functional unit and continues execution of the first power-aware operation without waiting for the first functional unit to be powered up.
-
公开(公告)号:US09996320B2
公开(公告)日:2018-06-12
申请号:US14757942
申请日:2015-12-23
Applicant: Intel Corporation
Inventor: Cristina S. Anderson , Marius A. Cornea-Hasegan , Elmoustapha Ould-Ahmed-Vall , Robert Valentine , Jesus Corbal , Nikita Astafev , Mark J. Charney , Milind B. Girkar , Amit Gradstein , Simon Rubanovich , Zeev Sperber
CPC classification number: G06F7/4876 , G06F7/485 , G06F7/49915
Abstract: An example processor includes a register and a fused multiply-add (FMA) low functional unit. The register stores first, second, and third floating point (FP) values. The FMA low functional unit receives a request to perform an FMA low operation: multiplies the first FP value with the second FP value to obtain a first product value; adds the first product with the third FP value to generate a first result value; rounds the first result to generate a first FMA value; multiplies the first FP value with the second FP value to obtain a second product value; adds the second product value with the third FP value to generate a second result value; and subtracts the FMA value from the second result value to obtain a third result value, which can then be normalized and rounded (FMA low result) and sent the FMA low result to an application.
-
公开(公告)号:US20180129266A1
公开(公告)日:2018-05-10
申请号:US15849838
申请日:2017-12-21
Applicant: Intel Corporation
Inventor: Nadav Bonen , Ron Gabor , Zeev Sperber , Vjekoslav Svilan , David N. Mackintosh , Jose A. Baiocchi Paredes , Naveen Kumar , Shantanu Gupta
CPC classification number: G06F1/3243 , G06F1/3287 , G06F9/30083 , G06F9/3869 , G06F9/3885 , Y02B70/123 , Y02B70/126 , Y02D10/152 , Y02D10/171
Abstract: In an embodiment, the present invention includes an execution unit to execute instructions of a first type, a local power gate circuit coupled to the execution unit to power gate the execution unit while a second execution unit is to execute instructions of a second type, and a controller coupled to the local power gate circuit to cause it to power gate the execution unit when an instruction stream does not include the first type of instructions. Other embodiments are described and claimed.
-
-
-
-
-
-
-
-
-