摘要:
Embodiments are provided in which cache update is implemented by using a counter table having a plurality of entries to keep track of different modified cache lines of a cache of a processor. If a cache line of the cache is modified by the processor and the original content of the cache line came from a cache of another processor, a counter in the counter table restarts and reaches a predetermined value (e.g., overflows) triggering the broadcast of the modified cache line so that the cache of the other processor can snarf a copy of the modified cache line. As a result, when the other processor reads from a memory address matching that of the cache line, the cache of the other processor already has the most current copy for the matching memory address to feed the processor. Therefore, a cache read miss is avoided and system performance is improved.
摘要:
A method of providing history based done logic includes receiving a cache line in a L2 cache; determining if the cache line has a history of access at least three times on a previous call into the L2 cache; providing the cache line directly to a processor if the history of access was less then the at least three times; and loading the cache line into an L1 cache if the history of access was the at least three times.
摘要:
A method of providing history based done logic for a D-cache includes receiving a D-cache line in an L2 cache; determining if the D-cache line is unprefetchable; aging the D-cache line without a delay if the D-cache line is prefetchable; and aging the D-cache line with a delay if the D-cache line is unprefetchable.
摘要:
One embodiment of the invention provides a processor. The processor generally includes a first and second processor core, each having a plurality of pipelined execution units for executing an issue group of multiple instructions and scheduling logic configured to issue a first issue group of instructions to the first processor core for execution and a second issue group of instructions to the second processor core for execution when the processor is in a first mode of operation and configured to issue one or more vector instructions for concurrent execution on the first and second processor cores when the processor is in a second mode of operation.
摘要:
Embodiments of the invention provide a look-aside-look-aside buffer (LLB) configured to retain a portion of the real addresses in a translation look-aside (TLB) buffer to allow prefetching of data from a cache. A subset of real address bits associated with an effective address may be retrieved relatively quickly from the LLB, thereby allowing access to the cache before the complete address translation is available and reducing cache access latency.
摘要:
A method of providing history based done logic for a I-cache includes receiving an I-cache line in an L2 cache; determining if the I-cache line is unprefetchable; aging the I-cache line without a delay if the I-cache line is prefetchable; and aging the I-cache line with a delay is the I-cache line is unprefetchable.
摘要:
Embodiments of the invention provide a processor for executing instructions. In one embodiment, the processor includes circuitry to receive a load instruction and a store instruction to be executed in the processor and detect a conflict between the load instruction and the store instruction. Detecting the conflict includes determining if load-store conflict information indicates that the load instruction previously conflicted with the store instruction. The load-store conflict information is stored for both the load instruction and the store instruction. The processor further includes circuitry to schedule execution of the load instruction and the store instruction so that execution of the load instruction and the store instruction do not result in a conflict.
摘要:
Embodiments of the invention provide an apparatus of storing branch prediction information. In one embodiment, an integrated circuit device includes a first table for storing local branch prediction information, a second table for storing global branch prediction information, and circuitry. The circuitry is configured to receive a branch instruction and store local branch prediction information for the branch instruction in the first table. The local branch prediction information includes a local predictability value for the local branch prediction information. The circuitry is further configured to store global branch prediction information for the branch instruction in the second table only if the local predictability value is below a threshold value of predictability.
摘要:
Improved techniques for executing instructions in a pipelined manner that may reduce stalls that occur when executing dependent instructions are provided. Stalls may be reduced by utilizing a cascaded arrangement of pipelines with execution units that are delayed with respect to each other. This cascaded delayed arrangement allows dependent instructions to be issued within a common issue group by scheduling them for execution in different pipelines to execute at different times.
摘要:
Embodiments of the invention provide a method and processor for executing instructions. In one embodiment, the method includes receiving a load instruction and a store instruction to be executed in the processor and detecting a conflict between the load instruction and the store instruction. Detecting the conflict includes determining if load-store conflict information indicates that the load instruction previously conflicted with the store instruction. The load-store conflict information is stored for both the load instruction and the store instruction. The method further includes scheduling execution of the load instruction and the store instruction so that execution of the load instruction and the store instruction do not result in a conflict.