Abstract:
Method and apparatus implementing smart store operations with conditional ownership requests. One aspect includes a method implemented in a multi-core processor, the method comprises: receiving a conditional read for ownership (CondRFO) from a requester in response to an execution of an instruction to modify a target cache line (CL) with a new value, the CondRFO identifying the target CL and the new value; determining from a local cache a local CL corresponding to the target CL; determining a local value from the local CL; comparing the local value with the new value; setting a coherency state of the local CL to (S)hared when the local value is same as the new value; setting the coherency state of the local CL to (I)nvalid when the local value is different than the new value; and sending a response and a copy of the local CL to the requester. Other embodiments include an apparatus configured to perform the actions of the methods.
Abstract:
Apparatus and methods are disclosed for performing memory operations instructions in a block-based processor architecture. In certain examples of the disclosed technology, a block-based processor core coupled to memory includes a control unit configured to issue one or more memory operations encoded in an instruction block allocated to the core and to commit the core when execution of the instruction block is complete, a memory store queue configured to cache one or more operands for the one or more memory operations, where a result of performing the memory operations is not architecturally visible unless the instruction block is committed by the control unit, and a memory interface configured to store the cached operands in the memory responsive to the instruction block committing. In some examples, the block-based processor core supports memory synchronization using load linked and store conditional instructions.
Abstract:
Apparatus and methods are disclosed for controlling execution of memory access instructions in a block-based processor architecture using a hardware structure that indicates a relative ordering of memory access instruction in an instruction block. In one example of the disclosed technology, a method of executing an instruction block having a plurality of memory load and/or memory store instructions includes selecting a next memory load or memory store instruction to execute based on dependencies encoded within the block, and on a store vector that stores data indicating which memory load and memory store instructions in the instruction block have executed. The store vector can be masked using a store mask. The store mask can be generated when decoding the instruction block, or copied from an instruction block header. Based on the encoded dependencies and the masked store vector, the next instruction can issue when its dependencies are available.
Abstract:
Apparatus and methods are disclosed for nullifying memory store instructions and one or more registers identified in a target field of a nullification instruction. In some examples of the disclosed technology, an apparatus can include memory and one or more block-based processor cores configured to fetch and execute a plurality of instruction blocks. One of the cores can include a control unit configured, based at least in part on receiving a nullification instruction, to obtain an instruction identification for a memory access instruction of a plurality of memory access instructions and a register identification of at least one of a plurality of registers, based on a first and second target fields of the nullification instruction. The at least one register and the memory access instruction associated with the instruction identification are nullified. Based on the nullified memory access instruction, a subsequent memory access instruction is executed.
Abstract:
Apparatus and methods are disclosed for nullifying one or more registers identified in a target field of a nullification instruction. In some examples of the disclosed technology, an apparatus can include memory and one or more block-based processor cores configured to fetch and execute a plurality of instruction blocks. One of the cores can include a control unit configured, based at least in part on receiving a nullification instruction, to obtain a register identification of at least one of a plurality of registers, based on a target field of the nullification instruction. A write to the at least one register associated with the register identification is nullified. The nullification instruction is in a first instruction block of the plurality of instruction blocks. Based on the nullified write to the at least one register, a subsequent instruction is executed from a second, different instruction block.
Abstract:
Apparatus and methods are disclosed for initiating instruction block execution using a register access instruction (e.g., a register Read instruction). In some examples of the disclosed technology, a block-based computing system can include a plurality of processor cores configured to execute at least one instruction block. The at least one instruction block encodes a data-flow instruction set architecture (ISA). The ISA includes a first plurality of instructions and a second plurality of instructions. One or more of the first plurality of instructions specify at least a first target instruction without specifying a data source operand. One or more of the second plurality of instructions specify at least a second target instruction and a data source operand that specifies a register.
Abstract:
Embodiments of a method and apparatus for implementing and maintaining a stack of predicate values with stack synchronization instructions. In one embodiment the apparatus is an out of order hardware/software co-designed processor including instructions to explicitly manage the predicate register stack to maintain stack consistency across branches of executing that push a variable number of predicate values onto the predicate stack. In one embodiment the stack-based predicate register implementation enables early branch calculation and early branch misprediction recovery via early renaming of predicate registers.
Abstract:
Apparatus for data processing and a method of data processing are provided, according to which the processing circuitry of the apparatus can access a memory system and execute data processing instructions in one context of multiple contexts which it supports. When the processing circuitry executes a barrier instruction, the resulting access ordering constraint may be limited to being enforced for accesses which have been initiated by the processing circuitry when operating in an identified context, which may for example be the context in which the barrier instruction has been executed. This provides a separation between the operation of the processing circuitry in its multiple possible contexts and in particular avoids delays in the completion of the access ordering constraint, for example relating to accesses to high latency regions of memory, from affecting the timing sensitivities of other contexts.
Abstract:
Runtime selection and modification of conditional expressions in a computing system has broad applicability in application areas involving deployments of large numbers of network-connected handsets and other devices, as well as in high availability computing environments and essential computing services. The invention describes the deferred evaluation of conditional statements in a trusted execution context such that the problem of spoofing return code is eliminated. The system allows for any set of relevant attributes to be considered in the conditional evaluation. The executable statements associated with the returned evaluation of the conditional is also dynamic and is selected at runtime.