摘要:
Scan logic which tracks the relative age of stores with respect to a particular load (or of loads with respect to a particular store) allows at processor to hold younger stores until the completion of older loads (or to hold younger loads until completion of older stores). Embodiments of propagate-kill style lookahead scan logic or of tree-structured, hierarchically-organized scan logic constructed in accordance with the present invention provide store older and load older indications with very few gate delays, even in processor embodiments adapted to concurrently evaluate large numbers of operations. Operating in conjunction with the scan logic, address matching logic allows the processor to more precisely tailor its avoidance of load-store (or store-load) dependencies. In a processor having a load unit and a store unit, a load/store execution control system allows load and store instructions to execute generally out-of-order with respect to each other while enforcing data dependencies between the load and store instructions.
摘要:
Scheduler logic which tracks the relative age of stores with respect to a particular load (and of loads with respect to a particular store) allows a load-store execution controller constructed in accordance with the present invention to hold younger stores until the completion of older loads (and to hold younger loads until completion of older stores). Address matching logic allows a load-store execution controller constructed in accordance with the present invention to avoid load-store (and store-load) dependencies. Propagate-kill scan chains supply the relative age indications of loads with respect to stores (and of stores with respect to loads).
摘要:
A processing system includes sequential entries for storing operations of different types and a scan chain which can identify an operation of a first type which follows after an operation of a second type. The first and second types can be identical so that the scan chain identifies the second operation of a particular type in the sequence. The scan chain includes single-entry "generate", "propagate", "kill", and "only" terms which control a scan bit. Conceptually, if the "only" term is not asserted, an entry of the second type generates the scan bit and asserts the "only" term. After the "only" term is asserted, further generation of the scan bit is inhibited. Each entry either propagates the scan bit to the next entry or if the entry is of the first type, kills the scan bit and identifies itself as the selected entry. Look-ahead logic determines group terms from single-entry terms to indicate whether a scan bit would be generated, propagated, or killed by a group of entries. Accordingly, the scan bit is not required to propagate through every entry, and scans can be performed quickly.
摘要:
A superscalar processor includes a scheduler which selects operations for out-of-order execution. The scheduler contains storage and control logic which is partitioned into entries corresponding to operations to be executed, being executed, or completed. The scheduler issues operations to execution units for parallel pipelined execution, selects and provides operands as required for execution, and acts as a reorder buffer keeping the results of operations until the results can be safely committed. The scheduler is tightly coupled to execution pipelines and provides a large parallel path for initial operation stages which minimize pipeline bottlenecks and hold ups into and out of the execution units. The scheduler monitors the entries to determine when all operands required for execution of an operation are available and provides required operands to the execution units. The operands selected can be from a register file, a scheduler entry, or an execution unit. Control logic in the entries is linked together into scan chains which identify operations and operands for execution.
摘要:
A superscalar processor includes a scheduler which selects operations for out-of-order execution. The scheduler contains storage and control logic which is partitioned into entries corresponding to operations to be executed, being executed, or completed. The scheduler issues operations to execution units for parallel pipelined execution, selects and provides operands as required for execution, and acts as a reorder buffer keeping the results of operations until the results can be safely committed. The scheduler is tightly coupled to execution pipelines and provides a large parallel path for initial operation stages which minimize pipeline bottlenecks and hold ups into and out of the execution units. The scheduler monitors the entries to determine when all operands required for execution of an operation are available and provides required operands to the execution units. The operands selected can be from a register file, a scheduler entry, or an execution unit. Control logic in the entries is linked together into scan chains which identify operations and operands for execution.
摘要:
A circuit includes a sequential entries for storing objects of different types and a scan chain which can identify an object of a first type which follows after an object of a second type. The first and second types can be identical so that the scan chain identifies the second object of a particular type in the sequence. The scan chain includes single-entry "generate", "propagate", "kill", and "only" terms which control a scan bit. Conceptually, if the "only" term is not asserted, an entry of the second type generates the scan bit and asserts the "only" term. After the "only" term is asserted, further generation of the scan bit is inhibited. Each entry either propagates the scan bit to the next entry or if the entry is of the first type, kills the scan bit and identifies itself as the selected entry. Look-ahead logic determines group terms from single-entry terms to indicate whether a scan bit would be generated, propagated, or killed by a group of entries. Accordingly, the scan bit is not required to propagate through every entry, and scans can be performed quickly.
摘要:
Accordingly, a prefetch instruction mechanism is desired for implementing a prefetch instruction which is non-faulting, non-blocking, and non-modifying of architectural register state. Advantageously, a prefetch mechanism described herein is provided largely without the addition of substantial complexity to a load execution unit. In one embodiment, the non-faulting attribute of the prefetch mechanism is provided though use of the vector decode supplied Op sequence that activates an alternate exception handler. The non-modifying of architectural register state attribute is provided (in an exemplary embodiment) by first decoding a PREFETCH instruction to an Op sequence targeting a scratch register wherein the scratch register has scope limited to the Op sequence corresponding to the PREFETCH instruction. Although described in the context of a vector decode embodiment, the prefetch mechanism can be implemented with hardware decoders and suitable modifications to decode paths will be appreciated by those of skill in the art based on the description herein. Similarly, although in one particular embodiment such a scratch register is architecturally defined to read as a NULL (or zero) value, any target for the Op sequence that is not part of the architectural state of the processor would also be suitable. Finally, in one embodiment the non-blocking attribute is provided by the Op sequence completing (without waiting for return of fill data) upon posting of a cache fill request to load logic of a data cache. In this way, LdOps which follow in a load pipe are not stalled by a prefetch-related miss and can instead execute concurrently with the prefetch-related line fill.
摘要:
A processor which includes tags indicating memory addresses for instructions advancing through pipeline stages of the processor and which includes an instruction decoder having a store target address buffer allows a self-modifying code handling system to detect store operations writing into the instruction stream and trigger a self-modifying code fault. In one embodiment of a seIf-modifying code handling system, a store pipe is coupled to a data cache to commit results of a store operation to a memory subsystem. The store pipe supplies a store operation target address indication on commitment of a store operation result. A scheduler includes ordered Op entries for Ops decoded from instructions and includes corresponding first address tags covering memory addresses for the instructions. First comparison logic is coupled to the store pipe and to the first address tags to trigger self-modifying code fault handling means in response to a match between the store operation target address and one of the first address tags. An instruction decoder is coupled between the instruction cache and the scheduler. The instruction decoder includes instruction buffer entries and second address tags associated with the instruction buffer entries. Second comparison logic is coupled to the store pipe and to the second address tags to trigger the self-modifying code fault handling means in response to a match between the store operation target address and one of the second address tags.
摘要:
A superscalar microprocessor includes a scheduler which contains storage for information related to operations and scan logic for selecting operations for out-of-order execution by a set of execution units. To provide fast operation, the selection is made without regard for the availability of operands which are required for execution of the operation but may be unavailable pending completion of an operation. An operand forward stage, which follows the issue stage, selects sources for an operand which may be a register file or a sourcing operation in the scheduler, completed or not. The scheduler contains all information describing the sourcing operations and forwards an operand value and information indicating the state of a sourcing operations. The state information indicates whether the sourcing operation is complete and execution of the issued operation can continue. The state also indicates a wait until the sourcing operation will complete. If the wait is too long, the issued operation is bumped so that another operation can be executed. This reduces pipeline hold ups and increase execution unit utilization.
摘要:
A semiconductor memory array with Built-in Self-Repair (BISR) includes redundancy circuits associated with failed row address stores to drive redundant row word lines, thereby obviating the supply and normal decoding of a substitute addresses. NOT comparator logic compares a failed row address generated and stored by BISR circuits to a row address supplied to the memory array. A TRUE comparator configured in parallel with the NOT comparator simultaneously compares defective row address signal to the supplied row address. Since NOT comparison is performed quickly in dynamic logic without setup and hold time constraints, timing impact on a normal (non-redundant) row decode path is negligible, and since TRUE comparison, though potentially slower than NOT comparison, itself identifies a redundant row address and therefore need not employ an N-bit address to selected word-line decode, redundant row addressing is rapid and does not adversely degrade performance of a self-repaired semiconductor memory array. By providing redundancy handling at the predecode circuit level, rather than at a preliminary address substitution stage, access times to a BISR memory array in accordance with the present invention are improved.