摘要:
Method, system and computer program product for generating effective addresses in a data processing system. A method, in a data processing system, for generating an effective address includes generating a first portion of the effective address by calculating a first plurality of effective address bits of the effective address, and generating a second portion of the effective address by guessing a second plurality of effective address bits of the effective address. By intelligently guessing a plurality of the effective address bits that form the effective address, the effective address can be generated and sent to a translation unit more quickly than in a system in which all the effective address bits of the effective address are calculated. The method and system is particularly suitable for generating effective addresses in a CAM-based effective address translation design in a multi-threaded environment.
摘要:
Method, system and computer program product for generating effective addresses in a data processing system. A method, in a data processing system, for generating an effective address includes generating a first portion of the effective address by calculating a first plurality of effective address bits of the effective address, and generating a second portion of the effective address by guessing a second plurality of effective address bits of the effective address. By intelligently guessing a plurality of the effective address bits that form the effective address, the effective address can be generated and sent to a translation unit more quickly than in a system in which all the effective address bits of the effective address are calculated. The method and system is particularly suitable for generating effective addresses in a CAM-based effective address translation design in a multi-threaded environment.
摘要:
Method, system and computer program product for generating effective addresses in a data processing system. A method, in a data processing system, for generating an effective address includes generating a first portion of the effective address by calculating a first plurality of effective address bits of the effective address, and generating a second portion of the effective address by guessing a second plurality of effective address bits of the effective address. By intelligently guessing a plurality of the effective address bits that form the effective address, the effective address can be generated and sent to a translation unit more quickly than in a system in which all the effective address bits of the effective address are calculated. The method and system is particularly suitable for generating effective addresses in a CAM-based effective address translation design in a multi-threaded environment.
摘要:
A method, apparatus and algorithm for quickly detecting an address match in a deeply pipelined processor design in a manner that may be implemented using a minimum of physical space in the critical area of the processor. The address comparison is split into two parts. The first part is a fast, partial address match comparator system. The second part is a slower, full address match comparator system. If a partial match between a requested address and a registry address is detected, then execution of the program or set of instructions requesting the address is temporarily suspended while a full address match check is performed. If the full address match check results in a full match between the requested address and a registry address, then the program or set of instructions is interrupted and stopped. Otherwise, the program or set of instructions continues execution.
摘要:
A method, system, and computer program product for supporting multiple fetch requests to the same congruence class in an n-way set associative cache. Responsive to receiving an incoming fetch instruction at a load/store unit, outstanding valid fetch entries in the n-way set associative cache that have the same cache congruence class as the incoming fetch instruction are identified. SetIDs in used by these identified outstanding valid fetch entries are determined. A resulting setID is assigned to the incoming fetch instruction based on the identified setIDs, wherein the resulting setID assigned is a setID not currently in use by the outstanding valid fetch entries. The resulting setID for the incoming fetch instruction is written in a corresponding entry in the n-way set associative cache.
摘要:
A method, system, and computer program product for enhancing performance of an in-order microprocessor with long stalls. In particular, the mechanism of the present invention provides a data structure for storing data within the processor. The mechanism of the present invention comprises a data structure including information used by the processor. The data structure includes a group of bits to keep track of which instructions preceded a rejected instruction and therefore will be allowed to complete and which instructions follow the rejected instruction. The group of bits comprises a bit indicating whether a reject was a fast or slow reject; and a bit for each cycle that represents a state of an instruction passing through a pipeline. The processor speculatively continues to execute a set bit's corresponding instruction during stalled periods in order to generate addresses that will be needed when the stall period ends and normal dispatch resumes.
摘要:
A method, apparatus, and computer program product are disclosed in a data processing system for sharing data in a cache among multiple threads in a simultaneous multi-threaded (SMT) processor. The SMT processor executes multiple threads concurrently during each clock cycle. The cache is dynamically allocated for use among the multiple threads. Portions of the cache are capable of being designated to store private data that is used exclusively by only a first one of the threads. The portions of the cache are capable of being designated to store shared data that can be used by any one of the multiple threads. The size of the portions can be changed dynamically during execution of the threads.
摘要:
A method and system for maintaining a best-case demand redispatch of an instruction to allow for maximizing the time a rejected thread may execute in lookahead execution mode, while maintaining the smallest L1 cache miss penalty supported by the memory subsystem. In response to a demand miss, a load/store unit sends a fetch request to the next level cache. The cache line of the demand miss is examined to identify the critical sector. Once the critical sector is identified, a best-case data return time is determined based on the fastest time the next level cache is able to return the critical sector of the cache line. The load/store unit then sends a speculative warning to the dispatch unit to coincide with the best-case data return, wherein the speculative warning prepares the dispatch unit to resend the instruction for execution as soon as data is available to the processor core.
摘要:
A method and system for maintaining a best-case demand redispatch of an instruction to allow for maximizing the time a rejected thread may execute in lookahead execution mode, while maintaining the smallest L1 cache miss penalty supported by the memory subsystem. In response to a demand miss, a load/store unit sends a fetch request to the next level cache. The cache line of the demand miss is examined to identify the critical sector. Once the critical sector is identified, a best-case data return time is determined based on the fastest time the next level cache is able to return the critical sector of the cache line. The load/store unit then sends a speculative warning to the dispatch unit to coincide with the best-case data return, wherein the speculative warning prepares the dispatch unit to resend the instruction for execution as soon as data is available to the processor core.
摘要:
An apparatus and method are disclosed for detecting multiple hits in CAM arrays. A binary address value is stored for each entry of the CAM array and is output to identify the matching entry for a single hit. However, to facilitate multiple hit detection, both the true and complement components of this address are stored and output to determine whether or not a multiple hit occurred. If a multiple hit occurs (e.g., more than one address location has been matched), all the bits that make up the binary address and the complement will not be complements of each other and a multiple hit condition can be detected by XORing each bit of an address location value with the complement of that address location value. If the XORed bits are equal to “1”, then a single hit has occurred. Otherwise, a multiple hit has occurred.