摘要:
A method and system is disclosed for software manipulation of hardware prediction mechanism in a data processor with software prediction. The hardware branch prediction mechanism is enhanced with at least two bits for path prediction. These bits are settable by a software and are capable of overriding the hardware branch prediction mechanism. Branch prediction information is encoded into a branch instruction in the software. This information includes a pre-determined value for each bit. Finally, a branch path of said instruction is predicted based on the value of the bits.
摘要:
A method of performing operations to a link stack including the step of performing a Pop operation from the link stack which includes the substeps of storing a first pointer value to the link stack, the first pointer value being the value of a pointer to the link stack before the Pop operation, and storing a first address including a first tag popped from the link stack. The method further includes the step of performing a Push operation to the link stack which includes the substeps of storing a second address including a second tag being Pushed into the link stack and storing a second pointer to the link stack, the second pointer being the value of the pointer to the link stack after the Push operation. The method additionally provides for the recovering of the link stack following an instruction flush which includes the substeps of comparing the first pointer value and the second value, comparing the first tag and the second tag, and replacing an address at the top of the link stack with the first address when the first and second pointers match and the first and second tags match.
摘要:
Branch prediction logic is enhanced to provide a monitoring function for certain conditions which indicate that the use of separate BHTs and predicted target address cache would provide better results for branch prediction. The branch prediction logic responds to the occurrence of the monitored condition by logically splitting the BHTs and count cache so that half of the address space is allocated to a first thread and the second half is allocated to the next thread. Prediction-generated addresses that belong to the first thread are then directed to the half of the array that is allocated to that thread and prediction-generated addresses that belong to the second thread are directed to the next half of the array that is allocated to the second thread. In order to split the array, the highest order bit in the array is utilized to uniquely identify addresses of the first and the second threads.
摘要:
An SMT system has a single thread mode and an SMT mode. Instructions are alternately selected from two threads every clock cycle and loaded into the IFAR in a three cycle pipeline of the IFU. If a branch predicted taken instruction is detected in the branch prediction circuit in stage three of the pipeline, then in the single thread mode a calculated address from the branch prediction circuit is loaded into the IFAR on the next clock cycle. If the instruction in the branch prediction circuit detects a branch predicted taken in the SMT mode, then the selected instruction address is loaded into the IFAR on the first clock cycle following branch predicted taken detection. The calculated target address is fed back and loaded into the IFAR in the second clock cycle following branch predicted taken detection. Feedback delay effectively switches the pipeline from three stages to four stages.
摘要:
A set-associative I-cache that enables early cache hit prediction and correct way selection when the processor is executing instructions of multiple threads having similar EAs. Each way of the I-cache comprises an EA Directory (EA Dir), which includes a series of thread valid bits that are individually assigned to one of the multiple threads. Particular ones of the thread valid bits are set in each EA Dir to indicate when an instruction block the thread is cached within the particular way with which the EA Dir is associated. When a cache line request for a particular thread is received, a cache hit is predicted when the EA of the request matches the EA in the EA Dir and the cache line is selected from the way associated with the EA Dir who has the thread valid bit for that thread set. Early way selection is thus achieved since the way selection only requires a check of the thread valid bits.
摘要:
A method of prefetching addresses includes the step of accessing a stored instruction using a current address. During the access using the current address, a target address is accessed in a branch target address cache. A stored instruction associated with the target address accessed from the branch target address cache is prefetched and the branch target address is indexed with selected bits from the address accessed from the branch target address cache.
摘要:
Improved conditional branch instruction prediction by detecting branch aliasing in a branch history table. Each entry in an aliasing table is associated with only one of a plurality of conditional branch instructions tracked by the branch history table. Prior to executing a conditional branch instruction, outcome of the execution of the conditional branch instruction is predicted utilizing the branch history table entry associated with the conditional branch instruction. Outcome of the execution of the conditional branch instruction is also predicted utilizing the aliasing table entry associated with the conditional branch instruction. Branch aliasing is detected by comparing the prediction made utilizing the branch history table with the prediction made utilizing the aliasing table. In response to the predictions being different, a determination is made that branch aliasing occurred, and the prediction made utilizing the aliasing table is utilized for predicting the outcome of the execution of the conditional branch instruction.
摘要:
A field is defined in branch instructions which is interpreted by software as “Hint” bits and these bits are used to signal the processor of special circumstances that may arise when doing speculative branch instruction execution to enable better branch address prediction accuracy and a reduction in link stack corruption which improves overall execution times. A programmer or compiler determines if a branch instruction usage fits in the context for a Hint action. If so, the compiler or programmer, using assembly/machine language, sets Hint bits in the branch instruction when it is compiled. If the branch is later speculatively executed, the processor decodes the Hint bits and executes and a hardware action corresponding the decode of the Hint bits. These Hints include four specific Hint actions, however, the field reserved for Hint bits is five bit wide reserving up to thirty-two specific Hint cases may be specified. These Hint cases (or Hint bits) may be interpreted differently for each type of branch instruction supported.
摘要:
A set of helper thread binaries is created to retrieve data used by a set of main thread binaries. The set of helper thread binaries and the set of main thread binaries are partitioned according to common instruction boundaries. As a first partition in the set of main thread binaries executes within a first core, a second partition in the set of helper thread binaries executes within a second core, thus “warming up” the cache in the second core. When the first partition of the main completes execution, a second partition of the main core moves to the second core, and executes using the warmed up cache in the second core.
摘要:
A method and data processing system for managing running of instructions in a program. A processor of the data processing system receives a monitoring instruction of a monitoring unit. The processor determines if at least one secondary thread of a set of secondary threads is available for use as an assist thread. The processor selects the at least one secondary thread from the set of secondary threads to become the assist thread in response to a determination that the at least one secondary thread of the set of secondary threads is available for use as an assist thread. The processor changes profiling of running of instructions in the program from the main thread to the assist thread.