-
公开(公告)号:US10241794B2
公开(公告)日:2019-03-26
申请号:US15391703
申请日:2016-12-27
Applicant: Intel Corporation
Inventor: Sergey P. Scherbinin , Jayesh Iyer , Alexander Y. Ostanevich , Dmitry Maslennikov , Denis G. Motin , Alexander V. Ermolovich , Andrey Chudnovets , Sergey A. Rozhkov , Boris A. Babayan
Abstract: Embodiments described herein generally relate to the field of multi-strand out-of-order loop processing, and, more specifically, to apparatus and methods to support counted loop exits in a multi-strand loop processor. In one embodiment, a processor includes a loop accelerator comprising a strand documentation buffer and a plurality of strand execution circuits; and a binary translator to receive a plurality of loop instructions, divide the plurality of loop instructions into a plurality of strands, and store a strand documentation for each of the plurality of strands into the strand documentation buffer, each strand documentation indicating at least a number of iterations; wherein the binary translator further causes the loop accelerator to execute the plurality of strands asynchronously and in parallel using the plurality of strand execution circuits, wherein each of the strand execution circuits repeats the strand for the number of iterations indicated in the strand documentation associated with the strand.
-
2.
公开(公告)号:US20180181398A1
公开(公告)日:2018-06-28
申请号:US15392626
申请日:2016-12-28
Applicant: Intel Corporation
Inventor: Sergey P. Scherbinin , Jayesh Iyer , Alexander Y. Ostanevich , Dmitry Maslennikov , Denis G. Motin , Alexander V. Ermolovich , Andrey Chudnovets , Sergey A. Rozhkov , Boris A. Babayan
CPC classification number: G06F9/30083 , G06F8/452 , G06F9/3004 , G06F9/30072 , G06F9/30145 , G06F9/3017 , G06F9/325 , G06F9/35 , G06F9/3802 , G06F9/3851 , G06F9/3867 , G06F9/3885
Abstract: Embodiments described herein relate to apparatus and methods for decomposing loops to improve performance and power efficiency. In one embodiment, a processor includes: a loop accelerator including a plurality of strand execution circuits, a binary translator to: receive a plurality of instructions from an instruction storage, to determine whether the plurality of instructions include loop instructions, and, in response to determining that they do, to divide the loop instructions into two or more jobs using at least one job creation rule, to assign the two or more jobs to two or more strands using at least one strand creation rule, and to cause the loop accelerator to execute at least two of the two or more strands in parallel using the plurality of strand execution circuits.
-
公开(公告)号:US20180285119A1
公开(公告)日:2018-10-04
申请号:US15562408
申请日:2015-03-27
Applicant: Intel Corporation
Inventor: Alexandr Titov , Dmitry Maslennikov , Sergey Y. SHISHLOV , Valentin Burov , Pavel Matveyev
IPC: G06F9/38
Abstract: A processor includes execution units, a front end, and an execution engine. The front end includes logic to receive instructions in different strands of ordered instructions and to send the instructions to the execution engine. The engine includes logic to determine that the instructions in different strands reference a same logical register mapped to a physical register, that the instructions reference each other, and that one of the instructions referencing the other was processed after the instruction defining the logical register.
-
公开(公告)号:US20180181400A1
公开(公告)日:2018-06-28
申请号:US15391703
申请日:2016-12-27
Applicant: Intel Corporation
Inventor: Sergey P. Scherbinin , Jayesh Iyer , Alexander Y. Ostanevich , Dmitry Maslennikov , Denis G. Motin , Alexander V. Ermolovich , Andrey Chudnovets , Sergey A. Rozhkov , Boris A. Babayan
IPC: G06F9/32
CPC classification number: G06F9/325 , G06F8/443 , G06F8/4452 , G06F8/452 , G06F9/30065 , G06F9/381 , G06F9/3851 , G06F9/45516
Abstract: Embodiments described herein generally relate to the field of multi-strand out-of-order loop processing, and, more specifically, to apparatus and methods to support counted loop exits in a multi-strand loop processor. In one embodiment, a processor includes a loop accelerator comprising a strand documentation buffer and a plurality of strand execution circuits; and a binary translator to receive a plurality of loop instructions, divide the plurality of loop instructions into a plurality of strands, and store a strand documentation for each of the plurality of strands into the strand documentation buffer, each strand documentation indicating at least a number of iterations; wherein the binary translator further causes the loop accelerator to execute the plurality of strands asynchronously and in parallel using the plurality of strand execution circuits, wherein each of the strand execution circuits repeats the strand for the number of iterations indicated in the strand documentation associated with the strand.
-
-
-