Abstract:
Apparatus and methods are disclosed for example computer processors that are based on a hybrid dataflow execution model. Embodiments of the disclosed technology use read instructions to retrieve a value from a specified register in the register file of the processor architecture and send the value for use by one or more targets (e.g., other instructions in the instruction block). The read instruction may be predicated such that the instruction is only executed when a predicate condition is satisfied. In some examples of the disclosed technology, a compiler for such processors performs an analysis of the source and/or object code being compiled in order to determine whether operation(s) along conditional paths can be executed before or concurrently with determination of a condition on which the conditional operation(s) depend, thus improving processor efficiency.
Abstract:
Systems, apparatuses, and methods related to a block-based processor core composition register are disclosed. In one example of the disclosed technology, a processor can include a plurality of block-based processor cores for executing a program including a plurality of instruction blocks. A respective block-based processor core can include one or more sharable resources and a programmable composition control register. The programmable composition control register can be used to configure which resources of the one or more sharable resources are shared with other processor cores of the plurality of processor cores.
Abstract:
Distinct system registers for logical processors are disclosed. In one example of the disclosed technology, a processor includes a plurality of block-based physical processor cores for executing a program comprising a plurality of instruction blocks. The processor also includes a thread scheduler configured to schedule a thread of the program for execution, the thread using the one or more instruction blocks. The processor further includes at least one system register. The at least one system register stores data indicating a number and placement of the plurality of physical processor cores to form a logical processor. The logical processor executes the scheduled thread. The logical processor is configured to execute the thread in a continuous instruction window.
Abstract:
Apparatus and methods are disclosed for dynamic nullification of memory access instructions, such as memory store instructions. In some examples of the disclosed technology, an apparatus can include memory and one or more block-based processor cores. One of the cores can include an execution unit configured to execute memory access instructions comprising a plurality of memory load and/or memory store instructions contained in an instruction block. The core can also include a hardware structure storing data for at least one predicate instruction in the instruction block, the data identifying whether one or more of the memory store instructions will issue if a condition of the predicate instruction is satisfied. The core may further include a control unit configured to control issuing of the memory access instructions to the execution unit based at least in a part on the hardware structure data.
Abstract:
Apparatus and methods are disclosed for example computer processors that are based on a hybrid dataflow execution model. In particular embodiments, a processor core in a block-based processor comprises: one or more functional units configured to perform functions using one or more operands; an instruction window comprising buffers configured to store individual instructions for execution by the processor core, the instruction window including one or more operand buffers for an individual instruction configured to store operand values; a control unit configured to execute the instructions in the instruction window and control operation of the one or more functional units; and a broadcast value store comprising a plurality of buffers dedicated to storing broadcast values, each buffer of the broadcast value store being associated with a respective broadcast channel from among a plurality of available broadcast channels.
Abstract:
Apparatus and methods are disclosed for nullifying memory store instructions identified in a target field of a nullification instruction. In some examples of the disclosed technology, an apparatus can include memory and one or more block-based processor cores configured to fetch and execute a plurality of instruction blocks. One of the cores can include a control unit configured, based at least in part on receiving a nullification instruction, to obtain an instruction identification for a memory access instruction of a plurality of memory access instructions, based on a target field of the nullification instruction. The memory access instruction associated with the instruction identification is nullified. The memory access instruction is in a first instruction block of the plurality of instruction blocks. Based on the nullified memory access instruction, a subsequent memory access instruction from the first instruction block is executed.
Abstract:
The disclosed technology can be used for executing and committing instruction blocks of a block-based processor architecture out-of-order. In one example of the disclosed technology, an apparatus can include a plurality of block-based processor cores which can include a first group of cores and a second group of cores. The first group of cores can be configured to commit instruction blocks of the set of instruction blocks in a sequential program order. The second group of cores can be configured to commit instruction blocks of the set of instruction blocks out-of-order relative to the sequential program order.
Abstract:
According to one aspect, a computer system includes a configuration with a machine enabled to operate in a single thread (ST) mode and a multithreading (MT) mode. In addition, the machine includes physical threads. The machine is configured to perform a method that includes issuing a start-virtual-execution (start- VE) instruction to dispatch a guest entity having multiple logical threads on the core. The guest entity includes all or a part of a guest virtual machine (VM), and issuing is performed by a host running on one of the physical threads on the core in the ST mode. The executing of the start- VE instruction by the machine includes mapping each of the logical threads to a corresponding one of the physical threads, initializing each of the mapped physical threads with a state of the corresponding logical thread, and starting execution of the guest entity on the core in MT mode.
Abstract:
Embodiments relate to multithreading in a computer. An aspect is a computer including a configuration having a core which includes physical threads and is operable in single thread (ST) and multithreading (MT) modes. The computer also includes a host program configured to execute in the ST mode on the core to issue a start-virtual-execution (start-VE) instruction to dispatch a guest entity which includes a guest virtual machine (VM). The start-VE instruction is executed by the core and includes obtaining a state description, having a guest state, from a location specified by the start-VE instruction. The execution includes determining, based on the guest state, whether the guest entity includes a single guest thread or multiple guest threads, and starting the guest threads in the MT mode or ST mode based on the guest state and a determination of whether the guest entity includes a single guest thread or multiple guest threads.