-
1.
公开(公告)号:US10579378B2
公开(公告)日:2020-03-03
申请号:US14228016
申请日:2014-03-27
Applicant: Intel Corporation
Inventor: Edward T. Grochowski , Victor W. Lee , Sergey A. Rozhkov , Boris A. Babayan
IPC: G06F9/30
Abstract: An apparatus and method are described for executing instructions using a predicate register. For example, one embodiment of a processor comprises: a register set including a predicate register to store a set of predicate condition bits, the predicate condition bits specifying whether results of a particular predicated instruction sequence are to be retained or discarded; and predicate execution logic to execute a first predicate instruction to indicate a start of a new predicated instruction sequence by copying a condition value from a processor control register in the register set to the predicate register. In a further embodiment, the predicate condition bits in the predicate register are to be shifted in response to the first predicate instruction to free space within the predicate register for the new condition value associated with the new predicated instruction sequence.
-
公开(公告)号:US10241794B2
公开(公告)日:2019-03-26
申请号:US15391703
申请日:2016-12-27
Applicant: Intel Corporation
Inventor: Sergey P. Scherbinin , Jayesh Iyer , Alexander Y. Ostanevich , Dmitry Maslennikov , Denis G. Motin , Alexander V. Ermolovich , Andrey Chudnovets , Sergey A. Rozhkov , Boris A. Babayan
Abstract: Embodiments described herein generally relate to the field of multi-strand out-of-order loop processing, and, more specifically, to apparatus and methods to support counted loop exits in a multi-strand loop processor. In one embodiment, a processor includes a loop accelerator comprising a strand documentation buffer and a plurality of strand execution circuits; and a binary translator to receive a plurality of loop instructions, divide the plurality of loop instructions into a plurality of strands, and store a strand documentation for each of the plurality of strands into the strand documentation buffer, each strand documentation indicating at least a number of iterations; wherein the binary translator further causes the loop accelerator to execute the plurality of strands asynchronously and in parallel using the plurality of strand execution circuits, wherein each of the strand execution circuits repeats the strand for the number of iterations indicated in the strand documentation associated with the strand.
-
公开(公告)号:US20180181400A1
公开(公告)日:2018-06-28
申请号:US15391703
申请日:2016-12-27
Applicant: Intel Corporation
Inventor: Sergey P. Scherbinin , Jayesh Iyer , Alexander Y. Ostanevich , Dmitry Maslennikov , Denis G. Motin , Alexander V. Ermolovich , Andrey Chudnovets , Sergey A. Rozhkov , Boris A. Babayan
IPC: G06F9/32
CPC classification number: G06F9/325 , G06F8/443 , G06F8/4452 , G06F8/452 , G06F9/30065 , G06F9/381 , G06F9/3851 , G06F9/45516
Abstract: Embodiments described herein generally relate to the field of multi-strand out-of-order loop processing, and, more specifically, to apparatus and methods to support counted loop exits in a multi-strand loop processor. In one embodiment, a processor includes a loop accelerator comprising a strand documentation buffer and a plurality of strand execution circuits; and a binary translator to receive a plurality of loop instructions, divide the plurality of loop instructions into a plurality of strands, and store a strand documentation for each of the plurality of strands into the strand documentation buffer, each strand documentation indicating at least a number of iterations; wherein the binary translator further causes the loop accelerator to execute the plurality of strands asynchronously and in parallel using the plurality of strand execution circuits, wherein each of the strand execution circuits repeats the strand for the number of iterations indicated in the strand documentation associated with the strand.
-
公开(公告)号:US10241801B2
公开(公告)日:2019-03-26
申请号:US15390194
申请日:2016-12-23
Applicant: Intel Corporation
Inventor: Jayesh Iyer , Sergey P. Scherbinin , Alexander Y. Ostanevich , Dmitry M. Maslennikov , Denis G. Motin , Alexander V. Ermolovich , Andrey Chudnovets , Sergey A. Rozhkov , Boris A. Babayan
Abstract: An apparatus includes a register file and a binary translator to create a plurality of strands and a plurality of iteration windows, where each iteration window of the plurality of iteration windows is allocated a set of continuous registers of the register file. The apparatus further includes a buffer to store strand documentation for a strand from the plurality of strands, where the strand documentation for the strand is to include an indication of a current register base for the strand. The apparatus further includes an execution circuit to execute an instruction to update the current register base for the strand in the strand documentation for the strand based on a fixed step value and an iteration window size.
-
5.
公开(公告)号:US10241789B2
公开(公告)日:2019-03-26
申请号:US15391789
申请日:2016-12-27
Applicant: Intel Corporation
Inventor: Alexander Y. Ostanevich , Sergey P. Scherbinin , Jayesh Iyer , Dmitry M. Maslennikov , Denis G. Motin , Alexander V. Ermolovich , Andrey Chudnovets , Sergey A. Rozhkov , Boris A. Babayan
Abstract: An apparatus includes a binary translator to hoist a load instruction in a branch of a conditional statement above the conditional statement and insert a speculation control of load (SCL) instruction in a complementary branch of the conditional statement, where the SCL instruction provides an indication of a real program order (RPO) of the load instruction before the load instruction was hoisted. The apparatus further includes an execution circuit to execute the load instruction to perform a load and cause an entry for the load instruction to be inserted in an ordering buffer, and where the execution circuit is to execute the SCL instruction to locate the entry for the load instruction in the ordering buffer using the RPO of the load instruction provided by the SCL instruction and discard the entry for the load instruction from the ordering buffer.
-
公开(公告)号:US10235171B2
公开(公告)日:2019-03-19
申请号:US15391791
申请日:2016-12-27
Applicant: Intel Corporation
Inventor: Alexander Y. Ostanevich , Jayesh Iyer , Sergey P. Scherbinin , Dmitry M. Maslennikov , Denis G. Motin , Alexander V. Ermolovich , Andrey Chudnovets , Sergey A. Rozhkov , Boris A. Babayan
Abstract: An apparatus includes a first circuit to determine a real program order (RPO) of an eldest undispatched instruction from among a plurality of strands, a second circuit to determine an RPO limit based on a delta value and the RPO of the eldest undispatched instruction, an ordering buffer to store entries for instructions that are waiting to be retired, and a third circuit to execute an orderable instruction from a strand from the plurality of strands to cause an entry for the orderable instruction to be inserted into the ordering buffer in response to a determination that an RPO of the orderable instruction is less than or equal to the RPO limit.
-
7.
公开(公告)号:US20180181398A1
公开(公告)日:2018-06-28
申请号:US15392626
申请日:2016-12-28
Applicant: Intel Corporation
Inventor: Sergey P. Scherbinin , Jayesh Iyer , Alexander Y. Ostanevich , Dmitry Maslennikov , Denis G. Motin , Alexander V. Ermolovich , Andrey Chudnovets , Sergey A. Rozhkov , Boris A. Babayan
CPC classification number: G06F9/30083 , G06F8/452 , G06F9/3004 , G06F9/30072 , G06F9/30145 , G06F9/3017 , G06F9/325 , G06F9/35 , G06F9/3802 , G06F9/3851 , G06F9/3867 , G06F9/3885
Abstract: Embodiments described herein relate to apparatus and methods for decomposing loops to improve performance and power efficiency. In one embodiment, a processor includes: a loop accelerator including a plurality of strand execution circuits, a binary translator to: receive a plurality of instructions from an instruction storage, to determine whether the plurality of instructions include loop instructions, and, in response to determining that they do, to divide the loop instructions into two or more jobs using at least one job creation rule, to assign the two or more jobs to two or more strands using at least one strand creation rule, and to cause the loop accelerator to execute at least two of the two or more strands in parallel using the plurality of strand execution circuits.
-
-
-
-
-
-