摘要:
Instruction-level parallelism in software pipelined loops is exploited by predicting future register rotations. A processor includes an architected current frame marker register and at least one unarchitected frame marker register. Register rotation prediction is achieved by setting the register rotation of future iterations of a software loop to be a function of the unarchitected frame marker registers. True data dependencies remain, but the dependencies caused solely by register renaming are removed. Dynamic predication is used to predicate instructions from future iterations, allowing them to be squashed if dependencies are later found. The register renaming that results from the prediction can be included in instructions in a buffer, or a renaming stage in an execution pipeline can perform the renaming.
摘要:
A conjugate processor includes an instruction set architecture (ISA) visible portion having a main pipeline, and an h-flow portion having an h-flow pipeline. The binary executed on the conjugate processor includes an essential portion that is executed on the main pipeline and a non-essential portion that is executed on the h-flow pipeline. The non-essential portion includes hint calculus that is used to provide hints to the main pipeline. The conjugate processor also includes a conjugate mapping table that maps triggers to h-flow targets. Triggers can be instruction attributes, data attributes, state attributes or event attributes. When a trigger is satisfied, the h-flow code specified by the target is executed in the h-flow pipeline.
摘要:
Methods and apparatus for improving system performance using redundant arithmetic are disclosed. In one embodiment, one or more dependency chains are formed. A dependency chain may comprise of two or more instructions. A first instruction may generate a result in a redundant form. A second instruction may accept the result from the first instruction as a first input operand. The instructions in the dependency chain may execute separately from instructions not in the dependency chain.
摘要:
A conjugate processor includes an instruction set architecture (ISA) visible portion having a main pipeline, and an h-flow portion having an h-flow pipeline. The binary executed on the conjugate processor includes an essential portion that is executed on the main pipeline and a non-essential portion that is executed on the h-flow pipeline. The non-essential portion includes hint calculus that is used to provide hints to the main pipeline. The conjugate processor also includes a conjugate mapping table that maps triggers to h-flow targets. Triggers can be instruction attributes, data attributes, state attributes or event attributes. When a trigger is satisfied, the h-flow code specified by the target is executed in the h-flow pipeline.
摘要:
A conjugate processor includes an instruction set architecture (ISA) visible portion having a main pipeline, and an h-flow portion having an h-flow pipeline. The binary executed on the conjugate processor includes an essential portion that is executed on the main pipeline and a non-essential portion that is executed on the h-flow pipeline. The non-essential portion includes hint calculus that is used to provide hints to the main pipeline. The conjugate processor also includes a conjugate mapping table that maps triggers to h-flow targets. Triggers can be instruction attributes, data attributes, state attributes or event attributes. When a trigger is satisfied, the h-flow code specified by the target is executed in the h-flow pipeline.
摘要:
A method and apparatus for selectively storing a register stack onto a register stack backing store is disclosed. In one embodiment, a non-exclusive boundary is determined enclosing registers that were actually used (e.g. written to) by a function. The description of that boundary is saved, and only the contents of the registers within the boundary are saved to register stack backing store as part of a spill operation. When the function is later restored, the description of the boundary is recalled and used to support the loading of just those registers from the register stack backing store as part of a fill operation.
摘要:
In one implementation of the invention, a computer implemented method used in compiling a program includes identifying a covering load, which may be one of a set of covering loads, and a redundant load. The covering load and the redundant load have a first and second load type, respectively. The first and the second load type each may be one of a group of load types including a regular load and at least one speculative-type load. In one implementation, the group of load types includes at least one check-type load. One implementation of the invention is in a machine readable medium.
摘要:
Methods and apparatus to provide cache management in managed runtime environments are described. In one embodiment, a controller comprises logic to determine an update frequency for an object in the runtime environment and assigning the object to an unshared cache line when the update frequency exceeds an update frequency threshold. Other embodiments are also described.