Abstract:
Apparatus and methods are disclosed for example computer processors that are based on a hybrid dataflow execution model. Embodiments of the disclosed technology use read instructions to retrieve a value from a specified register in the register file of the processor architecture and send the value for use by one or more targets (e.g., other instructions in the instruction block). The read instruction may be predicated such that the instruction is only executed when a predicate condition is satisfied. In some examples of the disclosed technology, a compiler for such processors performs an analysis of the source and/or object code being compiled in order to determine whether operation(s) along conditional paths can be executed before or concurrently with determination of a condition on which the conditional operation(s) depend, thus improving processor efficiency.
Abstract:
Systems, apparatuses, and methods related to a block-based processor core composition register are disclosed. In one example of the disclosed technology, a processor can include a plurality of block-based processor cores for executing a program including a plurality of instruction blocks. A respective block-based processor core can include one or more sharable resources and a programmable composition control register. The programmable composition control register can be used to configure which resources of the one or more sharable resources are shared with other processor cores of the plurality of processor cores.
Abstract:
Distinct system registers for logical processors are disclosed. In one example of the disclosed technology, a processor includes a plurality of block-based physical processor cores for executing a program comprising a plurality of instruction blocks. The processor also includes a thread scheduler configured to schedule a thread of the program for execution, the thread using the one or more instruction blocks. The processor further includes at least one system register. The at least one system register stores data indicating a number and placement of the plurality of physical processor cores to form a logical processor. The logical processor executes the scheduled thread. The logical processor is configured to execute the thread in a continuous instruction window.
Abstract:
Apparatus and methods are disclosed for dynamic nullification of memory access instructions, such as memory store instructions. In some examples of the disclosed technology, an apparatus can include memory and one or more block-based processor cores. One of the cores can include an execution unit configured to execute memory access instructions comprising a plurality of memory load and/or memory store instructions contained in an instruction block. The core can also include a hardware structure storing data for at least one predicate instruction in the instruction block, the data identifying whether one or more of the memory store instructions will issue if a condition of the predicate instruction is satisfied. The core may further include a control unit configured to control issuing of the memory access instructions to the execution unit based at least in a part on the hardware structure data.
Abstract:
Apparatus and methods are disclosed for example computer processors that are based on a hybrid dataflow execution model. In particular embodiments, a processor core in a block-based processor comprises: one or more functional units configured to perform functions using one or more operands; an instruction window comprising buffers configured to store individual instructions for execution by the processor core, the instruction window including one or more operand buffers for an individual instruction configured to store operand values; a control unit configured to execute the instructions in the instruction window and control operation of the one or more functional units; and a broadcast value store comprising a plurality of buffers dedicated to storing broadcast values, each buffer of the broadcast value store being associated with a respective broadcast channel from among a plurality of available broadcast channels.
Abstract:
Apparatus and methods are disclosed for nullifying memory store instructions identified in a target field of a nullification instruction. In some examples of the disclosed technology, an apparatus can include memory and one or more block-based processor cores configured to fetch and execute a plurality of instruction blocks. One of the cores can include a control unit configured, based at least in part on receiving a nullification instruction, to obtain an instruction identification for a memory access instruction of a plurality of memory access instructions, based on a target field of the nullification instruction. The memory access instruction associated with the instruction identification is nullified. The memory access instruction is in a first instruction block of the plurality of instruction blocks. Based on the nullified memory access instruction, a subsequent memory access instruction from the first instruction block is executed.
Abstract:
The disclosed technology can be used for executing and committing instruction blocks of a block-based processor architecture out-of-order. In one example of the disclosed technology, an apparatus can include a plurality of block-based processor cores which can include a first group of cores and a second group of cores. The first group of cores can be configured to commit instruction blocks of the set of instruction blocks in a sequential program order. The second group of cores can be configured to commit instruction blocks of the set of instruction blocks out-of-order relative to the sequential program order.
Abstract:
Various embodiments are generally directed to techniques for assigning instances of blocks of instructions of a routine to one of multiple types of core of a heterogeneous set of cores of a processor component. An apparatus to select types of cores includes a processor component; a core selection component for execution by the processor component to select a core of multiple cores to execute an initial subset of multiple instances of an instruction block in parallel based on characteristics of instructions of the instruction block, and to select a core of the multiple cores to execute remaining instances of the multiple instances of the instruction block in parallel based on characteristics of execution of the initial subset stored in an execution database; and a monitoring component for execution by the processor component to record the characteristics of execution of the initial subset in the execution database. Other embodiments are described and claimed.
Abstract:
An architecture fear microprocessors and the like in which instructions include a type identifier, which selects one of several inteipretation registers. The mterpretation registers hold Mopnatiøn for interpreting the opcode of each instruction, so that a stream of compressed instructions (with type identifiers) can be translated into a stream of expanded instructions. Preferably the type identifiers also distinguish sequencer instructions from processing-element instructions, and can even distinguish among different types of sequencer instructions (as well as among different types of processing-element instructions).
Abstract:
An architecture fear microprocessors and the like in which instructions include a type identifier, which selects one of several inteipretation registers. The mterpretation registers hold Moπnatiøn for interpreting the opcode of each instruction, so that a stream of compressed instructions (with type identifiers) can be translated into a stream of expanded instructions. Preferably the type identifiers also distinguish sequencer instructions from processing-element instructions, and can even distinguish among different types of sequencer instructions (as well as among different types of processing-element instructions).