Abstract:
Technology related to out-of-order processor architectures is disclosed. In one example of the disclosed technology, a processor includes decode logic and issue logic. The decode logic is configured to decode a store mask of an instruction block. The instruction block can include load and store instructions. Each load and store instruction includes an identifier specifying a relative program order of the load or store instruction within the instruction block. The store mask identifies positions of the store instructions within the program order of the instruction block. The issue logic is configured to issue at least one of the instructions of the instruction block out of program order. The issue logic can be configured to use the decoded store mask to only issue load instructions after all store instructions preceding the load instructions have issued.
Abstract:
Systems, apparatuses, and methods related to a block-based processor core topology register are disclosed. In one example of the disclosed technology, a processor can include a plurality of block-based processor cores for executing a program including a plurality of instruction blocks. A respective block-based processor core can include a sharable resource and a programmable composition topology register. The programmable composition topology register can be used to assign a group of the physical processor cores that share the sharable resource.
Abstract:
Apparatus and methods are disclosed for controlling execution of memory access instructions in a block-based processor architecture using a hardware structure that generates a relative ordering of memory access instruction in an instruction block. In one example of the disclosed technology, a method of executing an instruction block having a plurality of memory load and/or memory store instructions includes decoding an instruction block encoding a plurality of memory access instructions and generating data indicating a relative order for executing the memory access instructions in the instruction block and scheduling operation of a portion of the instruction block based at least in part on the relative order data. In some examples, a store vector data register can store the generated relative ordering data for use in subsequent instances of the instruction block.
Abstract:
Different processor architectures are described to evaluate and track dependencies required by instructions. The processors may hold or queue instructions that require output of other instructions until required data and resources are available which may remove the requirement of NOPs in the instruction memory to resolve dependencies and pipeline hazards. The processor may divide instruction data into bundles for parallel execution and provide speculative execution. The processor may include various components to implement an evaluation unit, execution unit and termination unit.
Abstract:
Systems, methods, and computer-readable storage are disclosed for providing early access to target addresses in block-based processor architectures. In one example of the disclosed technology, a method of performing a branch in a block-based architecture can include executing one or more instructions of a first instruction block using a first core of the block-based architecture. The method can include, before the first instruction block is committed, initiating non-speculative execution of instructions of a second instruction block.
Abstract:
Apparatus and methods are disclosed for implementing bad jump detection in block-based processor architectures. In one example of the disclosed technology, a block-based processor includes one or more block-based processing cores configured to fetch and execute atomic blocks of instructions and a control unit configured to, based at least in part on receiving a branch signal indicating a target location is received from one of the instruction blocks, verify that the target location is a valid branch target.
Abstract:
A microprocessor instruction translator translates a conditional load instruction into at least two microinstructions. An out-of-order execution pipeline executes the microinstructions. To execute a first microinstruction, an execution unit receives source operands from the source registers of a register file and responsively generates a first result using the source operands. To execute a second the microinstruction, an execution unit receives a previous value of the destination register and the first result and responsively reads data from a memory location specified by the first result and provides a second result that is the data if a condition is satisfied and that is the previous destination register value if not. The previous value of the destination register comprises a result produced by execution of a microinstruction that is the most recent in-order previous writer of the destination register with respect to the second microinstruction.
Abstract:
A system that avoids locks by transactionally executing critical sections. The system receives a program which includes critical sections which are protected by locks. The system modifies the program so that the critical sections are executed transactionally without acquiring locks. The program is modified so that: (1) during transactional execution of a critical section, the program first determines if a lock associated with the critical section is held by another process and if so aborts the transactional execution; (2) if the transactional execution completes without encountering an interfering data access from another process, the program commits changes made during the transactional execution and optionally resumes normal non-transactional execution of the program past the critical section; and (3) if an interfering data access from another process is encountered during transactional execution, the program discards changes made during the transactional execution, and attempts to re-execute the critical section.
Abstract:
The present invention relates to a digital signal processing apparatus comprising a plurality of available hardware resource means and a first instruction set means having access to said available hardware resource means, so that at least a part of said hardware resource means execute operations under control of said first instruction set means, and further comprising a second instruction set means having access to only a predetermined limited subset of said plurality of available hardware resource means, so that at least a part of said predetermined limited subset of said hardware resource means execute operations under control of said second instruction set means. Further, the present invention relates to a method for processing digital signals in a digital signal processing apparatus comprising a plurality of available hardware resource means, wherein at least a part of said hardware resource means execute operations under control of a first instruction set, and wherein at least a part of a predetermined limited subset of said plurality of available hardware resource means execute operations under control of a second instruction set having access to only said predetermined limited subset of said hardware resource means.