Floating-point supportive pipeline for emulated shared memory architectures

    公开(公告)号:US11797310B2

    公开(公告)日:2023-10-24

    申请号:US15031285

    申请日:2014-10-23

    发明人: Martti Forsell

    IPC分类号: G06F9/38 G06F15/76

    摘要: A processor architecture arrangement for emulated shared memory (ESM) architectures is disclosed. The arrangement has a number of multi-threaded processors, each provided with an interleaved inter-thread pipeline and a plurality of functional units for carrying out arithmetic and logical operations on data. The pipeline has at least two operatively parallel pipeline branches. The first pipeline branch includes a first sub-group of the plurality of functional units, such as ALUs (arithmetic logic unit) for carrying out integer operations. The second pipeline branch includes non-overlapping subgroup of the plurality of functional units, such as FPUs (floating point unit) for carrying out floating point operations. One or more of the functional units of at least the second sub-group are located operatively in parallel with the memory access segment of the pipeline.

    PROCESSOR WITH ADAPTIVE PIPELINE LENGTH
    2.
    发明公开

    公开(公告)号:US20230273797A1

    公开(公告)日:2023-08-31

    申请号:US18314264

    申请日:2023-05-09

    IPC分类号: G06F9/38

    CPC分类号: G06F9/3873 G06F9/3838

    摘要: A system and method for reducing pipeline latency. In one embodiment, a processing system includes a processing pipeline. The processing pipeline includes a plurality of processing stages. Each stage is configured to further processing provided by a previous stage. A first of the stages is configured to perform a first function in a pipeline cycle. A second of the stages is disposed downstream of the first of the stages, and is configured to perform, in a pipeline cycle, a second function that is different from the first function. The first of the stages is further configured to selectably perform the first function and the second function in a pipeline cycle, and bypass the second of the stages.

    Single cycle instruction pipeline scheduling

    公开(公告)号:US09959122B2

    公开(公告)日:2018-05-01

    申请号:US13869488

    申请日:2013-04-24

    IPC分类号: G06F15/00 G06F9/46 G06F9/38

    摘要: A method includes allocating a first single-cycle instruction to a first pipeline that picks single-cycle instructions for execution in program order. The method further includes marking at least one source register of the first single-cycle instruction as ready for execution in the first pipeline in response to all older single-cycle instructions allocated to the first pipeline being ready and eligible to be picked for execution. An apparatus includes a decoder to decode a first single-cycle instruction and to allocate the first single-cycle instruction to a first pipeline. The apparatus further includes a scheduler to pick single-cycle instructions for execution by the first pipeline in program order and to mark at least one source register of the first single-cycle instruction as ready for execution in the first pipeline in response to determining that all older single-cycle instructions allocated to the first pipeline are ready and eligible.

    Variable Length Execution Pipeline
    10.
    发明申请

    公开(公告)号:US20170102942A1

    公开(公告)日:2017-04-13

    申请号:US15385544

    申请日:2016-12-20

    IPC分类号: G06F9/30 G06F15/80 G06F9/38

    摘要: In an aspect, a pipelined execution resource can produce an intermediate result for use in an iterative approximation algorithm in an odd number of clock cycles. The pipelined execution resource executes SIMD requests by staggering commencement of execution of the requests from a SIMD instruction. When executing one or more operations for a SIMD iterative approximation algorithm, and an operation for another SIMD iterative approximation algorithm is ready to begin execution, control logic causes intermediate results completed by the pipelined execution resource to pass through a wait state, before being used in a subsequent computation. This wait state presents two open scheduling cycles in which both parts of the next SIMD instruction can begin execution. Although the wait state increases latency to complete an in-progress algorithm, a total throughput of execution on the pipeline increases.