摘要:
A computer system includes a main memory, at least one processor, and at least one level of cache. The system maintains reference history data with respect to each addressable page in memory, preferably in a page table. The reference history data is preferably used to determine which cacheable sub-units of the page should be pre-fetched to the cache. The reference history data is preferably an up or down counter which is incremented if the cacheable sub-unit is loaded into cache and is referenced by the processor, and decremented if the sub-unit is loaded into cache and is not referenced before being cast out. The reference counter thus expresses an approximate likelihood, based on recent history, that the sub-unit will be referenced in the near future.
摘要:
A multiple parallel pipeline digital processing apparatus has the capability to substitute a second pipeline for a first in the event that a failure is detected in the first pipeline. Preferably, a redundant pipeline is shared by multiple primary pipelines. Preferably, the pipelines are located physically adjacent one another in an array. A pipeline failure causes data to be shifted one position within the array of pipelines, to by-pass the failing pipeline, so that each pipeline has only two sources of data, a primary and an alternate. Preferably, selection logic controlling the selection between a primary and alternate source of pipeline data is integrated with other pipeline operand selection logic.
摘要:
A register file bit includes a primary latch and a secondary latch with a feedback path and a context switch mechanism that allows a fast context switch when execution changes from one thread to the next. A bit value for a second thread of execution is stored in the primary latch, then transferred to the secondary latch. The bit value for a first thread of execution is then written to the primary latch. When a context switch is needed (when the first thread stalls and the second thread needs to begin execution), the register file bit can perform a context switch from the first thread to the second thread in a single clock cycle. The register file bit contains a backup latch inside the register file itself so that minimal extra wire paths are needed to or from the existing register file.
摘要:
A method and apparatus for D-cache miss prediction and scheduling is provided. In one embodiment, execution of an instruction in a processor is scheduled. The processor may have at least one cascaded delayed execution pipeline unit having two or more execution pipelines that execute instructions in a common issue group in a delayed manner relative to each other. The method includes receiving an issue group of instructions, determining if a first instruction in the issue group resulted in a cache miss during a previous execution of the first instruction, and if so, scheduling the first instruction to be executed in a pipeline in which execution is delayed with respect to another pipeline in the cascaded delayed execution pipeline unit.
摘要:
A method, completion table and processor for tracking a larger number of outstanding instructions. The completion table may include a plurality of entries where each entry tracks a consecutive number of outstanding instructions. Each entry may be configured to store an instruction address and an identification of a first of the consecutive number of outstanding instructions. By being able to track a consecutive number of outstanding instructions, such as the length of a cache line, in each entry in the completion table by only storing the instruction address and identification of the first of the consecutive number of outstanding instruction in that entry, the completion table may be able to track a larger number of outstanding instruction without increasing its size.
摘要:
A processor includes primary threads of execution that may simultaneously issue instructions, and one or more backup threads. When a primary thread stalls, the contents of its instruction buffer may be switched with the instruction buffer for a backup thread, thereby allowing the backup thread to begin execution. This design allows two primary threads to issue simultaneously, which allows for overlap of instruction pipeline latencies. This design further allows a fast switch to a backup thread when a primary thread stalls, thereby providing significantly improved throughput in executing instructions by the processor.
摘要:
Methods and systems for reducing cache miss rates for cache are disclosed. Embodiments may include a computer system with one or more processors and each processor may couple with a private cache. Embodiments selectively enable and implement a cache re-allocation scheme for cache lines of the private caches based upon a workload or an expected workload for the processors. In particular, a cache miss rate monitor may count the number of cache misses for each processor. A cache miss rate comparator compares the cache miss rates to determine whether one or more of the processors have significantly higher cache miss rates than the average cache miss rates within a processor module or overall. If one or more processors have significantly higher cache miss rates, cache requests from those processors are forwarded to private caches that have lower cache miss rates and have the least recently used cache lines.
摘要:
A self repairable processor that provides a reliable computing result without increasing the footprint of the on-chip devices. The processor has a plurality of data registers connected to two identical functional units, where only one of the functional units is enabled for computing, the two functional units being placed in a chip area defined at most by data paths needed for one functional unit. When an error condition is detected in the active functional unit, the processor disables the functional unit with an error condition and enables the duplicate functional unit.
摘要:
A method and apparatus for minimizing unscheduled D-cache miss pipeline stalls is provided. In one embodiment, execution of an instruction in a processor is scheduled. The processor may have at least one cascaded delayed execution pipeline unit having two or more execution pipelines that execute instructions in a common issue group in a delayed manner relative to each other. The method includes receiving an issue group of instructions, determining if a first instruction in the issue group is a load instruction, and if so, scheduling the first instruction to be executed in a pipeline in which execution is not delayed with respect to another pipeline in the cascaded delayed execution pipeline unit.
摘要:
Embodiments of the present invention provide a method and apparatus for prefetching instruction lines. In one embodiment, the method includes fetching a first instruction line from a level 2 cache, extracting, from the first instruction line, an address identifying a first data line containing data targeted by a data access instruction contained in the first instruction line or a different instruction line; and prefetching, from the level 2 cache, the first data line using the extracted address.