Abstract:
Systems, apparatuses, and methods related to a block-based processor core topology register are disclosed. In one example of the disclosed technology, a processor can include a plurality of block-based processor cores for executing a program including a plurality of instruction blocks. A respective block-based processor core can include a sharable resource and a programmable composition topology register. The programmable composition topology register can be used to assign a group of the physical processor cores that share the sharable resource.
Abstract:
Apparatus and methods are disclosed for controlling execution of memory access instructions in a block-based processor architecture using a hardware structure that generates a relative ordering of memory access instruction in an instruction block. In one example of the disclosed technology, a method of executing an instruction block having a plurality of memory load and/or memory store instructions includes decoding an instruction block encoding a plurality of memory access instructions and generating data indicating a relative order for executing the memory access instructions in the instruction block and scheduling operation of a portion of the instruction block based at least in part on the relative order data. In some examples, a store vector data register can store the generated relative ordering data for use in subsequent instances of the instruction block.
Abstract:
A method, system and program product for loading or storing memory data wherein the address of the memory operand is based an offset of the program counter rather than an explicitly defined address location. The offset is defined by an immediate field of the instruction which is sign extended and is aligned as a halfword address when added to the value of the program counter.
Abstract:
A method, system and program product for loading or storing memory data wherein the address of the memory operand is based an offset of the program counter rather than an explicitly defined address location. The offset is defined by an immediate field of the instruction which is sign extended and is aligned as a halfword address when added to the value of the program counter.
Abstract:
The invention relates to a microprocessor for processing different assemblers. A parameter is provided in the microprocessor, whereby said parameter marks the respective assembler code. Different relative addressing occurs according to the setting of the parameter. The invention also relates to methods for relative addressing in a microprocessor. Relative addresses are determined differently according to the operating state or parameter for the respective assembler code.
Abstract:
Technology related to prefetching instruction blocks is disclosed. In one example of the disclosed technology, a processor comprises a block-based processor core for executing a program comprising a plurality of instruction blocks. The block-based processor core can include prefetch logic and a local buffer. The prefetch logic can be configured to receive a reference to a predicted instruction block and to determine a mapping of the predicted instruction block to one or more lines. The local buffer can be configured to selectively store portions of the predicted instruction block and to provide the stored portions of the predicted instruction block when control of the program passes along a predicted execution path to the predicted instruction block.
Abstract:
Technology related to prefetching data associated with predicated stores of programs in block-based processor architectures is disclosed. In one example of the disclosed technology, a processor includes a block-based processor core for executing an instruction block comprising a plurality of instructions. The block-based processor core includes decode logic and prefetch logic. The decode logic is configured to detect a predicated store instruction of the instruction block. The prefetch logic is configured to calculate a target address of the predicated store instruction and initiate a memory operation associated with the calculated target address before a predicate of the predicated store instruction is calculated.
Abstract:
Apparatus and methods are disclosed for controlling instruction flow in block-based processor architectures. In one example of the disclosed technology, an instruction block address register stores an index address to a memory storing a plurality of instructions for an instruction block, the indexed address being inaccessible when the processor is in one or more unprivileged operational modes, one or more execution units configured to execute instructions for the instruction block, and a control unit configured to fetch and decode two or more of the plurality of instructions from the memory based on the indexed address.
Abstract:
Apparatus and methods are disclosed for decoding targets from an instruction and transmitting data to those targets in accordance with a current instruction. Multimodal target hardware is used in conjunction with one or more of the routers so as to route data to an appropriate target. The data can be one or more operands or a predicate and the targets can include operand buffers, broadcast channels, and general registers. In this way, operands, for example, can be directed for use with multiple subsequent instructions, and there are multiple modes for distributing the operands to the multiple instructions.
Abstract:
Apparatus and methods are disclosed for generating and using block branch metadata in block-based processor architectures. In one example of the disclosed technology, a block-based processor is configured to dynamically generate metadata representing control flow, exit points, and control flow probabilities for an instruction block while decoding and executing the block. The metadata can be used with subsequent invocations of the instruction block for branch and memory dependence predictions. In some examples, an incomplete portion of a control flow representation is generated for a number of predicated instructions and stored in a memory or storage device for enhancing prediction and prefetch for subsequent invocations of an instruction block.