PARALLEL PROCESSING IN HARDWARE ACCELERATORS COMMUNICABLY COUPLED WITH A PROCESSOR
    12.
    发明申请
    PARALLEL PROCESSING IN HARDWARE ACCELERATORS COMMUNICABLY COUPLED WITH A PROCESSOR 有权
    硬件加速器中的并行处理器与处理器通信

    公开(公告)号:US20160132329A1

    公开(公告)日:2016-05-12

    申请号:US14539674

    申请日:2014-11-12

    Abstract: In an embodiment, a device including a processor, a plurality of hardware accelerator engines and a hardware scheduler is disclosed. The processor is configured to schedule an execution of a plurality of instruction threads, where each instruction thread includes a plurality of instructions associated with an execution sequence. The plurality of hardware accelerator engines performs the scheduled execution of the plurality of instruction threads. The hardware scheduler is configured to control the scheduled execution such that each hardware accelerator engine is configured to execute a corresponding instruction and the plurality of instructions are executed by the plurality of hardware accelerator engines in a sequential manner. The plurality of instruction threads are executed by plurality of hardware accelerator engines in a parallel manner based on the execution sequence and an availability status of each of the plurality of hardware accelerator engines.

    Abstract translation: 在一个实施例中,公开了一种包括处理器,多个硬件加速器引擎和硬件调度器的设备。 处理器被配置为调度多个指令线程的执行,其中每个指令线程包括与执行序列相关联的多个指令。 多个硬件加速器引擎执行多个指令线程的调度执行。 硬件调度器被配置为控制调度的执行,使得每个硬件加速器引擎被配置为执行相应的指令,并且多个指令由多个硬件加速器引擎以顺序的方式执行。 基于执行顺序和多个硬件加速器引擎中的每一个的可用性状态,多个指令线程以并行方式由多个硬件加速器引擎执行。

    REGISTER FILE STRUCTURES COMBINING VECTOR AND SCALAR DATA WITH GLOBAL AND LOCAL ACCESSES
    13.
    发明申请
    REGISTER FILE STRUCTURES COMBINING VECTOR AND SCALAR DATA WITH GLOBAL AND LOCAL ACCESSES 有权
    寄存器文件结构组合向量和标量数据与全局和本地访问

    公开(公告)号:US20150019836A1

    公开(公告)日:2015-01-15

    申请号:US14327066

    申请日:2014-07-09

    CPC classification number: G06F9/30036 G06F9/30014 G06F9/30094 G06F9/3012

    Abstract: The number of registers required is reduced by overlapping scalar and vector registers. This also allows increased compiler flexibility when mixing scalar and vector instructions. Local register read ports are minimized by restricting read access. Dedicated predicate registers reduces requirements for general registers, and allows reduction of critical timing paths by allowing the predicate registers to be placed next to the predicate unit.

    Abstract translation: 所需的寄存器数量通过重叠标量和向量寄存器来减少。 这也允许在混合标量和向量指令时增加编译器的灵活性。 通过限制读取访问来使本地寄存器读取端口最小化。 专用谓词寄存器减少通用寄存器的要求,并允许通过允许将谓词寄存器放置在谓词单元旁边来减少关键定时路径。

Patent Agency Ranking