-
公开(公告)号:US11797310B2
公开(公告)日:2023-10-24
申请号:US15031285
申请日:2014-10-23
发明人: Martti Forsell
CPC分类号: G06F9/3867 , G06F9/3851 , G06F9/3873 , G06F9/3875 , G06F9/3885 , G06F15/76
摘要: A processor architecture arrangement for emulated shared memory (ESM) architectures is disclosed. The arrangement has a number of multi-threaded processors, each provided with an interleaved inter-thread pipeline and a plurality of functional units for carrying out arithmetic and logical operations on data. The pipeline has at least two operatively parallel pipeline branches. The first pipeline branch includes a first sub-group of the plurality of functional units, such as ALUs (arithmetic logic unit) for carrying out integer operations. The second pipeline branch includes non-overlapping subgroup of the plurality of functional units, such as FPUs (floating point unit) for carrying out floating point operations. One or more of the functional units of at least the second sub-group are located operatively in parallel with the memory access segment of the pipeline.
-
公开(公告)号:US20230273797A1
公开(公告)日:2023-08-31
申请号:US18314264
申请日:2023-05-09
IPC分类号: G06F9/38
CPC分类号: G06F9/3873 , G06F9/3838
摘要: A system and method for reducing pipeline latency. In one embodiment, a processing system includes a processing pipeline. The processing pipeline includes a plurality of processing stages. Each stage is configured to further processing provided by a previous stage. A first of the stages is configured to perform a first function in a pipeline cycle. A second of the stages is disposed downstream of the first of the stages, and is configured to perform, in a pipeline cycle, a second function that is different from the first function. The first of the stages is further configured to selectably perform the first function and the second function in a pipeline cycle, and bypass the second of the stages.
-
公开(公告)号:US11740898B2
公开(公告)日:2023-08-29
申请号:US16714974
申请日:2019-12-16
发明人: Yao Zhang , Bingrui Wang
IPC分类号: G06F9/30 , G06F7/491 , G06F17/16 , G06F9/38 , G06N20/00 , G06N3/063 , G06N3/08 , G06F13/28 , H03M7/24 , G06F16/901 , G06N3/02 , G06F12/0871 , G06F12/08
CPC分类号: G06F9/30025 , G06F7/491 , G06F9/3001 , G06F9/30014 , G06F9/30036 , G06F9/30181 , G06F9/3838 , G06F12/0871 , G06F13/28 , G06F16/9027 , G06N3/02 , G06N3/08 , G06N20/00 , H03M7/24 , G06F9/30101 , G06F9/3873 , G06F9/3877 , G06F17/16 , G06N3/063
摘要: The present disclosure provides a computation device. The computation device is configured to perform a machine learning computation, and includes an operation unit, a controller unit, and a conversion unit. The storage unit is configured to obtain input data and a computation instruction. The controller unit is configured to extract and parse the computation instruction from the storage unit to obtain one or more operation instructions, and to send the one or more operation instructions and the input data to the operation unit. The operation unit is configured to perform operations on the input data according to one or more operation instructions to obtain a computation result of the computation instruction. In the examples of the present disclosure, the input data involved in machine learning computations is represented by fixed-point data, thereby improving the processing speed and efficiency of training operations.
-
公开(公告)号:US11709672B2
公开(公告)日:2023-07-25
申请号:US16715170
申请日:2019-12-16
发明人: Yao Zhang , Bingrui Wang
IPC分类号: G06F9/30 , G06F7/491 , G06N20/00 , G06F16/901 , G06F13/28 , G06N3/02 , G06N3/08 , G06F9/38 , G06F12/0871 , H03M7/24 , G06N3/063 , G06F17/16
CPC分类号: G06F9/30025 , G06F7/491 , G06F9/3001 , G06F9/30014 , G06F9/30036 , G06F9/30181 , G06F9/3838 , G06F12/0871 , G06F13/28 , G06F16/9027 , G06N3/02 , G06N3/08 , G06N20/00 , H03M7/24 , G06F9/30101 , G06F9/3873 , G06F9/3877 , G06F17/16 , G06N3/063
摘要: The present disclosure provides a computation device. The computation device is configured to perform a machine learning computation, and includes an operation unit, a controller unit, and a conversion unit. The storage unit is configured to obtain input data and a computation instruction. The controller unit is configured to extract and parse the computation instruction from the storage unit to obtain one or more operation instructions, and to send the one or more operation instructions and the input data to the operation unit. The operation unit is configured to perform operations on the input data according to one or more operation instructions to obtain a computation result of the computation instruction. In the examples of the present disclosure, the input data involved in machine learning computations is represented by fixed-point data, thereby improving the processing speed and efficiency of training operations.
-
公开(公告)号:US09959122B2
公开(公告)日:2018-05-01
申请号:US13869488
申请日:2013-04-24
发明人: Michael D. Estlick , Jay E. Fleischman , Kevin A. Hurd , Mark M. Gibson , Kelvin D. Goveas , Brian M. Lay
CPC分类号: G06F9/3828 , G06F9/3836 , G06F9/3838 , G06F9/3873 , G06F9/3889
摘要: A method includes allocating a first single-cycle instruction to a first pipeline that picks single-cycle instructions for execution in program order. The method further includes marking at least one source register of the first single-cycle instruction as ready for execution in the first pipeline in response to all older single-cycle instructions allocated to the first pipeline being ready and eligible to be picked for execution. An apparatus includes a decoder to decode a first single-cycle instruction and to allocate the first single-cycle instruction to a first pipeline. The apparatus further includes a scheduler to pick single-cycle instructions for execution by the first pipeline in program order and to mark at least one source register of the first single-cycle instruction as ready for execution in the first pipeline in response to determining that all older single-cycle instructions allocated to the first pipeline are ready and eligible.
-
公开(公告)号:US20170315815A1
公开(公告)日:2017-11-02
申请号:US15224624
申请日:2016-07-31
发明人: Aaron L. Smith , Jan S. Gray
CPC分类号: G06F9/3836 , G06F9/3005 , G06F9/3016 , G06F9/3017 , G06F9/30181 , G06F9/30185 , G06F9/3802 , G06F9/3818 , G06F9/3834 , G06F9/3838 , G06F9/3855 , G06F9/3873 , G06F9/3885 , G06F9/3889 , G06F9/3897 , G06F12/0875 , G06F15/7867
摘要: Apparatus and methods are disclosed for implementing block-based processors having custom function blocks, including field-programmable gate array (FPGA) implementations. In some examples of the disclosed technology, a dynamically configurable scheduler is configured to issue at least one block-based processor instruction. A custom function block is configured to receive input operands for the instruction and generate ready state data indicating completion of a computation performed for the instruction by the respective custom function block.
-
公开(公告)号:US20170315813A1
公开(公告)日:2017-11-02
申请号:US15224473
申请日:2016-07-29
发明人: Aaron L. Smith , Jan S. Gray
CPC分类号: G06F9/3836 , G06F9/3005 , G06F9/3016 , G06F9/3017 , G06F9/30181 , G06F9/30185 , G06F9/3802 , G06F9/3818 , G06F9/3834 , G06F9/3838 , G06F9/3855 , G06F9/3873 , G06F9/3885 , G06F9/3889 , G06F9/3897 , G06F12/0875 , G06F15/7867
摘要: Apparatus and methods are disclosed for implementing incremental schedulers for out-of-order block-based processors, including field programmable gate array implementations. In one example of the disclosed technology, a processor includes an instruction scheduler formed by configuring one or more look up table RAMs to store ready state data for a plurality of instructions in an instruction block. The instruction scheduler further includes a plurality of queues that store ready state data for the processor and sends dependency information to ready determination logic on a first in/first out basis. The instruction scheduler selects one or more of the ready instructions to be issued and executed by the block-based processor.
-
公开(公告)号:US20170315812A1
公开(公告)日:2017-11-02
申请号:US15224471
申请日:2016-07-29
发明人: Aaron L. Smith , Jan S. Gray
CPC分类号: G06F9/3836 , G06F9/3005 , G06F9/3016 , G06F9/3017 , G06F9/30181 , G06F9/30185 , G06F9/3802 , G06F9/3818 , G06F9/3834 , G06F9/3838 , G06F9/3855 , G06F9/3873 , G06F9/3885 , G06F9/3889 , G06F9/3897 , G06F12/0875 , G06F15/7867
摘要: Apparatus and methods are disclosed for implementing block-based processors, including field programmable gate-array (FPGA) implementations. In one example of the disclosed technology, an instruction decoder configured to generate ready state data for a set of instructions in an instruction block, each of the set of instructions being associated with a different instruction identifier encoded in the transactional block and a parallel instruction scheduler configured to issue an instruction from the set of instructions based on the decoded ready state data. In some examples, the parallel instruction scheduler allows for improved area and energy savings according to the size and type of FPGA components available.
-
公开(公告)号:US20170192793A1
公开(公告)日:2017-07-06
申请号:US14986463
申请日:2015-12-31
CPC分类号: G06F9/3867 , G06F9/3001 , G06F9/30021 , G06F9/3802 , G06F9/3869 , G06F9/3873
摘要: Efficient instruction processing for sparse data includes extensions to a processor pipeline to identify zero-optimizable instructions that include at least one zero input operand, and bypass the execute stage of the processor pipeline, determining the result of the operation without executing the instruction. When possible, the extensions also bypass the writeback stage of the processor pipeline.
-
公开(公告)号:US20170102942A1
公开(公告)日:2017-04-13
申请号:US15385544
申请日:2016-12-20
发明人: Kristie Veith , Leonard Rarick , Manouk Manoukian
CPC分类号: G06F9/3001 , G06F9/30145 , G06F9/3836 , G06F9/3867 , G06F9/3873 , G06F9/3887 , G06F15/8007
摘要: In an aspect, a pipelined execution resource can produce an intermediate result for use in an iterative approximation algorithm in an odd number of clock cycles. The pipelined execution resource executes SIMD requests by staggering commencement of execution of the requests from a SIMD instruction. When executing one or more operations for a SIMD iterative approximation algorithm, and an operation for another SIMD iterative approximation algorithm is ready to begin execution, control logic causes intermediate results completed by the pipelined execution resource to pass through a wait state, before being used in a subsequent computation. This wait state presents two open scheduling cycles in which both parts of the next SIMD instruction can begin execution. Although the wait state increases latency to complete an in-progress algorithm, a total throughput of execution on the pipeline increases.
-
-
-
-
-
-
-
-
-