摘要:
A method of improving the performance of a computer processor by recognizing that two consecutive register instructions can be executed simultaneously and executing the two instructions simultaneously while generating a single data address and while performing exception checking on a single data address. During an instruction fetch process, two consecutive instructions are tested to determine if both are either register load instructions or register save instructions. If both instructions are load or save register instructions, the corresponding data addresses are tested to see if both data addresses are in the same double word. If both data addresses are in the same double word, then the instructions are executed simultaneously. Only one data address generation is required and exception processing is performed on only one data address. In one example embodiment, a simplified test rapidly ensures that both data addresses are in the same double word, but also requires the base addresses to be at an even word boundary. In a second embodiment, where the processor includes an alignment test as a separate test, an even more simple test rapidly ensures that both data address are in the same double word without checking alignment.
摘要:
The present application describes embodiments of a method and apparatus for concurrently accessing dirty bits in a cache. One embodiment of the apparatus includes a cache configurable to store a plurality of lines. The lines are grouped into a plurality of subsets the plurality of lines. This embodiment of the apparatus also includes a plurality of dirty bits associated with the plurality of lines and first circuitry configurable to concurrently access the plurality of dirty bits associated with at least one of the plurality of subsets of lines.
摘要:
The present invention provides a method and apparatus for allocating space in a unified cache. The method may include partitioning the unified cache into a first portion of lines that only store copies of instructions retrieved from a memory and a second portion of lines that only store copies of data retrieved from the memory.
摘要:
The present invention provides a method and apparatus for allocating space in a unified cache. The method may include partitioning the unified cache into a first portion of lines that only store copies of instructions retrieved from a memory and a second portion of lines that only store copies of data retrieved from the memory.
摘要:
A cache subsystem apparatus and method of operating therefor is disclosed. In one embodiment, a cache subsystem includes a cache memory divided into a plurality of sectors each having a corresponding plurality of cache lines. Each of the plurality of sectors is associated with a sector dirty bit that, when set, indicates at least one of its corresponding plurality of cache lines is storing modified data of any other location in a memory hierarchy including the cache memory. The cache subsystem further includes a cache controller configured to, responsive to initiation of a power down procedure, determine only in sectors having a corresponding sector dirty bit set which of the corresponding plurality of cache lines is storing modified data.
摘要:
A cache system includes plurality of first caches at a first level of a cache hierarchy and a second cache at a second level of the cache hierarchy which is lower than the first level of cache hierarchy coupled to each of the plurality of first caches. The second cache enforces a cache line replacement policy in which the second cache selects a cache line for replacement based in part on whether the cache line is present in any of the plurality of first caches and in part on another factor.
摘要:
The present invention provides a method and apparatus for allocating cache bandwidth to multiple processors. One embodiment of the method includes delaying, at a local device associated with a local cache, a first cache probe from a non-local device to the local cache following a second cache probe from the non-local device that matches a third cache probe from the local device.
摘要:
A method of detecting and correcting errors in a memory subsystem of a computer is described. The method includes beginning a write operation of N data bits to a memory, generating M check bits from the N data bits, writing the N data bits and the M check bits to the memory, reading the N data bits and M check bits from the memory, generating X syndrome bits from the N data bits and the M check bits, and using the X syndrome bits to detect and correct errors. Preferably, the M check bits are generated also from A address bits corresponding to the location in memory to which the N data bits and M check bits are to be written.
摘要:
Apparatus and method embodiments for dynamically allocating cache space in a multi-threaded execution environment are disclosed. In some embodiments, a processor includes a cache shared by each of a plurality of processor cores and/or each of a plurality of threads executing on the processor. The processor further includes a cache allocation circuit configured to dynamically allocate space in the cache provided to each of the plurality of processor cores based on their respective usage patterns. The cache allocation unit may track cache usage by each of the processor cores/threads using subsets of usage bits and counters configured to update states of the usage bits. The cache allocation circuit may track the usage of cache space by the processor cores/threads and may allocate more space to those that exhibit more usage of the cache.
摘要:
We report methods, integrated circuit devices, and fabrication processes relating to power management transitions of multiple compute units sharing a cache. One method includes indicating that a first compute unit of a plurality of compute units of an integrated circuit device is attempting to enter a low power state, determining if the first compute unit is the only compute unit of the plurality in a normal power state, and in response to determining the first compute unit is the only compute unit in the normal power state: saving a state of a shared cache unit of the integrated circuit device, flushing at least a portion of a cache of the shared cache unit, repeating the flushing until either a second compute unit exits the low power state or the cache is completely flushed, and permitting the first compute unit to enter the low power state.