摘要:
A technique for run-time tracking changes to variables and memory locations during code execution to increase efficiency of execution of the code and to facilitate in debugging the code. In one example embodiment, this is achieved by determining whether a received instruction in a trackable instruction during code execution. The trackable instructions can include one or more trackable variables. The trackable instruction is then decoded and a track instruction cache and a track variable cache are then updated with associated decoded trackable instruction and the one or more trackable variables, respectively.
摘要:
The present invention relates to a processing unit (1) for executing instructions in a computer system and to a method in such a processing unit. According to the present invention a decision is made whether or not to base execution on a value prediction (P), wherein the decision is based on information associated with the estimated time gain of execution based on a correct prediction. According to an embodiment of the present invention the decision regarding whether or not to execute speculatively is based on information (14) regarding whether a cache hit or a cache miss is detected in connection with a load instruction. In an alternative embodiment of the present invention the decision is based on information regarding the dependency depth of the load instruction, i.e. the number of instructions that are dependent on the load.
摘要:
A data cache configured to perform store accesses in a single clock cycle is provided. The data cache speculatively stores data within a predicted way of the cache after capturing the data currently being stored in that predicted way. During a subsequent clock cycle, the cache hit information for the store access validates the way prediction. If the way prediction is correct, then the store is complete. If the way prediction is incorrect, then the captured data is restored to the predicted way. If the store access hits in an unpredicted way, the store data is transferred into the correct storage location within the data cache concurrently with the restoration of data in the predicted storage location. Each store for which the way prediction is correct utilizes a single clock cycle of data cache bandwidth. Additionally, the way prediction structure implemented within the data cache bypasses the tag comparisons of the data cache to select data bytes for the output. Therefore, the access time of the associative data cache may be substantially similar to a direct-mapped cache access time. The present data cache is therefore suitable for high frequency superscalar microprocessors.
摘要:
A method for prefetching structured data, and more particularly a mechanism for observing address references made by a processor, and learning from those references the patterns of accesses made to structured data. Structured data means aggregates of related data such as arrays, records, and data containing links and pointers. When subsequent accesses are made to data structured in the same way, the mechanism generates in advance the sequence of addresses that will be needed for the new accesses. This sequence is utilized by the memory to obtain the data somewhat earlier than the instructions would normally request it, and thereby eliminate idle time due to memory latency while awaiting the arrival of the data.
摘要:
An information processing device (1), for detecting a register interference state where a register which is updated by a preceding instruction is used by a succeeding instruction, for example, when the generation of an operand address, is detected, the execution of a succeedingly fetched instruction is started by storing the operand address generated when the succeeding instruction is executed in association with the address of the succeeding instruction, and by using as an estimated address the operand address which corresponds to the address of the succeedingly fetched instruction and is retrieved from the stored contents.
摘要:
A data prediction structure is provided. The data prediction structure stores base addresses and stride values in a prediction array. The base address and the stride value are added to form a data prediction address which is then used to fetch data bytes into a relatively small, relatively fast buffer which may be accessed by the decode stage(s) of the instruction processing pipeline. If the data associated with an operand address calculated by a decode stage resides in the buffer, the clock cycles used to perform the load operation occur before the instruction reaches the execution stage of the instruction processing pipeline. The execution stage clock cycles that are saved may be used to execute other instructions. Additionally, the base address is updated to the address generated by a decode unit each time a basic block is executed, and the stride value is updated when the data prediction address is found to be incorrect. In this way, the data prediction address may be more accurate than a static data prediction address.
摘要:
A pipelined CPU executing instructions of variable length, and referencing memory using various data widths. Macroinstruction pipelining is employed (instead of microinstruction pipelining), with queueing between units of the CPU to allow flexibility in instruction execution times. A wide bandwidth is available for memory access; fetching 64-bit data blocks on each cycle. A hierarchical cache arrangement has an improved method of cache set selection, increasing the likelihood of a cache hit. A writeback cache is used (instead of writethrough) and writeback is allowed to proceed even though other accesses are suppressed due to queues being full. A branch prediction method employs a branch history table which records the taken vs. not-taken history of branch opcodes recently used, and uses an empirical algorithm to predict which way the next occurrence of this branch will go, based upon the history table. A floating point processor function is integrated on-chip, with enhanced speed due to a bypass technique; a trial mini-rounding is done on low-order bits of the result, and if correct, the last stage of the floating point processor can be bypassed, saving one cycle of latency. For CAL type instructions, a method for determining which registers need to be saved is executed in a minimum number of cycles, examining groups of register mask bits at one time. Internal processor registers are accessed with short (byte width) addresses instead of full physical addresses as used for memory and I/O references, but off-chip processor registers are memory-mapped and accessed by the same busses using the same controls as the memory and I/O. If a non-recoverable error detected by ECC circuits in the cache, an error transition mode is entered wherein the cache operates under limited access rules, allowing a maximum of access by the system for data blocks owned by the cache, but yet minimizing changes to the cache data so that diagnostics may be run. Separate queues are provided for the return data from memory and cache invalidates, yet the order or bus transactions is maintained by a pointer arrangement. The bus protocol used by the CPU to communicate with the system bus is of the pended type, with transactions on the bus identified by an ID field specifying the originator, and arbitration for bus grant goes one simultaneously with address/data transactions on the bus.
摘要:
Providing loop-invariant value prediction using a predicted values table, and related apparatuses, methods, and computer-readable media are disclosed. In one aspect, an apparatus comprising an instruction processing circuit is provided. The instruction processing circuit is configured to detect a loop body in an instruction stream, and to detect a value-generating instruction within the loop body. The instruction processing circuit determines whether an attribute of the value-generating instruction matches an entry of a predicted values table. If the attribute of the value-generating instruction is determined to be present in the entry of the predicted values table, the instruction processing circuit further determines whether a counter of the entry exceeds an iteration threshold. Responsive to determining that the counter of the entry exceeds the iteration threshold, the instruction processing circuit provides a predicted value in the entry of the predicted values table for execution of at least one dependent instruction.
摘要:
Techniques are described for determining whether execution of an instruction would require reading more values from a memory cell of a general purpose register (GPR) than a read port of the memory cell would allow. In such a case, the techniques may store, prior to execution of the instruction, one or more values from the memory cell in a separate conflict queue. During execution of the instruction to implement an operation defined by the instruction, one value that is an operand of the operation would be read from the memory cell and another value that is an operand of the operation other would be read from the conflict queue.
摘要:
Predicting literal load values using a literal load prediction table, and related circuits, methods, and computer-readable media are disclosed. In one aspect, an instruction processing circuit provides a literal load prediction table containing one or more entries, each comprising an address and a literal load value. Upon detecting a literal load instruction in an instruction stream, the instruction processing circuit determines whether the literal load prediction table contains an entry having an address of the literal load instruction. If so, the instruction processing circuit provides the predicted literal load value stored in the entry to at least one dependent instruction. The instruction processing circuit subsequently determines whether the predicted literal load value matches the actual literal load value loaded by the literal load instruction. If a mismatch exists, the instruction processing circuit initiates a misprediction recovery. The at least one dependent instruction is re-executed using the actual literal load value.