摘要:
Disclosed is a method and apparatus providing the capability to supplement a branch target buffer (BTB) with a recent entry queue. A recent entry queue prevents unnecessary removal of valuable BTB data of multiple entries for another entry. Additional, the recent entry queue detects when the latency of the BTB's startup latency is preventing it from asynchronous aiding the microprocessor pipeline as designed for and thereby can delay the pipeline in the required situations such that the BTB latency on startup can be overcome. Finally, the recent entry queue provides a quick access to BTB entries that are accessed in a tight loop pattern where the throughput of the standalone BTB is unable to track the throughput of the microprocessor execution pipeline. Through the usage of the recent entry queue, the modified BTB is capable of processing information at the rate of the execution pipeline thereby accelerating the execution pipeline.
摘要:
A memory storage structure includes a memory storage device, and a first meta-structure having a first size and operating at a first speed. The first speed is faster than a second speed for storing meta-information based on information stored in a memory. A second meta-structure is hierarchically associated with the first meta-structure. The second meta-structure has a second size larger than the first size and operates at the second speed such that faster and more accurate prefetching is provided by coaction of the first and second meta-structures. A method is provided to assemble the meta-information in the first meta-structure and copy this information to the second meta-structure, and prefetching the stored information from the second meta-structure to the first meta-structure ahead of its use.
摘要:
A memory storage structure includes a memory storage device, and a first meta-structure having a first size and operating at a first speed. The first speed is faster than a second speed for storing meta-information based on information stored in a memory. A second meta-structure is hierarchically associated with the first meta-structure. The second meta-structure has a second size larger than the first size and operates at the second speed such that faster and more accurate prefetching is provided by coaction of the first and second meta-structures. A method is provided to assemble the meta-information in the first meta-structure and copy this information to the second meta-structure, and prefetching the stored information from the second meta-structure to the first meta-structure ahead of its use.
摘要:
A method providing a microprocessor with the ability to predict data cache content based on the instruction address of an instruction which is accessing the data cache allows the reduction of address generation interlocking scenarios with the ability to self-correct should the data cache content prediction be incorrect. Content prediction accuracy is kept high through the use of multiple filters. One filter allows predictions to be only used in scenarios where address generation interlock scenarios are present. A second filter allows predictions to be made only when patterns are detected which suggest a prediction will be correct. The third and final filter further improves prediction coverage by detecting patterns of correct potential predictions and utilizing them in the future when they would otherwise be ignored by the basic prediction mechanism.
摘要:
A method and apparatus for providing the capability to create a dynamic based buffer structure that takes an instruction addresses organized instruction cache and through the interaction of an asynchronous branch target buffer (BTB) and branch history table (BHT) forms a series of instructions that resembles a trace cache in the buffer structure. By allowing the dynamic creation of a predicted code sequence trace in the buffer structure, based on the past behavior of the instruction code, the usage of fetching is utilized and the instruction cache makes optimal use of area while reducing latency penalties associated with taken branches and branches which are predicted in the improper direction.
摘要:
Disclosed is a method and apparatus providing a microprocessor the ability to reuse data cache content fetched during runahead execution. Said data is stored and later retrieved based upon the instruction address of an instruction which is accessing the data cache. The reuse mechanism allows the reduction of address generation interlocking scenarios with the ability to self-correct should the stored values be incorrect due to subtleties in the architected state of memory in multiprocessor systems.
摘要:
Disclosed is a method and apparatus providing the ability to create a multi-level prediction algorithm where branch predictions beyond the first level of prediction are maintained at a secondary level because the prior level was unsuccessfully able to highly predict accurate the direction of the stated branch. A secondary level is smaller in size than the upper level through selected filtering thereby enabling high prediction accuracy of branches while minimizing the amount of hardware to perform stated predictions.
摘要:
Disclosed is a method and apparatus providing the capability to prevent particular branches from being written into the BTB, thereby making them non-predictable. By making certain branches only detectable at decode time frame, branch prediction can completely run asynchronous of decode. By allowing branch prediction logic to cover as wide a range of branches as possible, the efficiency of fetching of branch targets way before the branch itself achieves a higher level of precision. This increased level of precision eliminates pipeline stalls between branches and targets where prior concerns of creating data integrity within the pipeline of a microprocessor existed.