Abstract:
The number of registers required is reduced by overlapping scalar and vector registers. This also allows increased compiler flexibility when mixing scalar and vector instructions. Local register read ports are minimized by restricting read access. Dedicated predicate registers reduces requirements for general registers, and allows reduction of critical timing paths by allowing the predicate registers to be placed next to the predicate unit.
Abstract:
In an embodiment, a device including a processor, a plurality of hardware accelerator engines and a hardware scheduler is disclosed. The processor is configured to schedule an execution of a plurality of instruction threads, where each instruction thread includes a plurality of instructions associated with an execution sequence. The plurality of hardware accelerator engines performs the scheduled execution of the plurality of instruction threads. The hardware scheduler is configured to control the scheduled execution such that each hardware accelerator engine is configured to execute a corresponding instruction and the plurality of instructions are executed by the plurality of hardware accelerator engines in a sequential manner. The plurality of instruction threads are executed by plurality of hardware accelerator engines in a parallel manner based on the execution sequence and an availability status of each of the plurality of hardware accelerator engines.
Abstract:
The number of registers required is reduced by overlapping scalar and vector registers. This also allows increased compiler flexibility when mixing scalar and vector instructions. Local register read ports are minimized by restricting read access. Dedicated predicate registers reduces requirements for general registers, and allows reduction of critical timing paths by allowing the predicate registers to be placed next to the predicate unit.