摘要:
One embodiment of the present invention sets forth a compression status bit cache with deterministic latency for isochronous memory clients of compressed memory. The compression status bit cache improves overall memory system performance by providing on-chip availability of compression status bits that are used to size and interpret a memory access request to compressed memory. To avoid non-deterministic latency when an isochronous memory client accesses the compression status bit cache, two design features are employed. The first design feature involves bypassing any intermediate cache when the compression status bit cache reads a new cache line in response to a cache read miss, thereby eliminating additional, potentially non-deterministic latencies outside the scope of the compression status bit cache. The second design feature involves maintaining a minimum pool of clean cache lines by opportunistically writing back dirty cache lines and, optionally, temporarily blocking non-critical requests that would dirty already clean cache lines. With clean cache lines available to be overwritten quickly, the compression status bit cache avoids incurring additional miss write back latencies.
摘要:
One embodiment of the invention sets forth a mechanism for efficiently processing atomic operations transmitted from multiple general processing clusters to an L2 cache. A tag look-up unit tracks the availability of each cache line in the L2 cache, reserves the necessary cache lines for the atomic operations and transmits the atomic operations to an ALU for processing. The tag look-up unit also increments a reference counter associated with a reserved cache line each time an atomic operation associated with that cache line is received. This feature allows multiple atomic operations associated with the same cache line to be pipelined to the ALU. A ROP unit that includes the ALU may request additional data necessary to process an atomic operation from the L2 cache. Result data is stored in the L2 cache and may also be returned to the general processing clusters.
摘要:
One embodiment of the present invention sets forth a method for implementing ECC protection in an on-chip L2 cache. When data is written to or read from an external memory, logic within the L2 cache is configured to generate ECC check bits and store the ECC check bits in the L2 cache in space typically allocated for storing byte enables. As a result, data stored in the L2 cache may be protected against bit errors without incurring the costs of providing additional storage or complex hardware for the ECC check bits.
摘要:
Systems and methods are disclosed for managing the number of affirmatively associated cache lines related to the different sets of a data cache unit. A tag look-up unit implements two thresholds, which may be configurable thresholds, to manage the number of cache lines related to a given set that store dirty data or are reserved for in-flight read requests. If the number of affirmatively associated cache lines in a given set is equal to a maximum threshold, the tag look-up unit stalls future requests that require an available cache line within that set to be affirmatively associated. To reduce the number of stalled requests, the tag look-up unit transmits a high priority clean notification to a frame buffer logic when the number of affirmatively associated cache lines in a given set approaches the maximum threshold. The frame buffer logic then processes requests associated with that set preemptively.
摘要:
A method for cleaning dirty data in an intermediate cache is disclosed. A dirty data notification, including a memory address and a data class, is transmitted by a level 2 (L2) cache to frame buffer logic when dirty data is stored in the L2 cache. The data classes may include evict first, evict normal and evict last. In one embodiment, data belonging to the evict first data class is raster operations data with little reuse potential. The frame buffer logic uses a notification sorter to organize dirty data notifications, where an entry in the notification sorter stores the DRAM bank page number, a first count of cache lines that have resident dirty data and a second count of cache lines that have resident evict_first dirty data associated with that DRAM bank. The frame buffer logic transmits dirty data associated with an entry when the first count reaches a threshold.
摘要:
One embodiment of the present invention sets forth a method for implementing ECC protection in an on-chip L2 cache. When data is written to or read from an external memory, logic within the L2 cache is configured to generate ECC check bits and store the ECC check bits in the L2 cache in space typically allocated for storing byte enables. As a result, data stored in the L2 cache may be protected against bit errors without incurring the costs of providing additional storage or complex hardware for the ECC check bits.
摘要:
A method for cleaning dirty data in an intermediate cache is disclosed. A dirty data notification, including a memory address and a data class, is transmitted by a level 2 (L2) cache to frame buffer logic when dirty data is stored in the L2 cache. The data classes may include evict first, evict normal and evict last. In one embodiment, data belonging to the evict first data class is raster operations data with little reuse potential. The frame buffer logic uses a notification sorter to organize dirty data notifications, where an entry in the notification sorter stores the DRAM bank page number, a first count of cache lines that have resident dirty data and a second count of cache lines that have resident evict_first dirty data associated with that DRAM bank. The frame buffer logic transmits dirty data associated with an entry when the first count reaches a threshold.
摘要:
One embodiment of the present invention sets forth a compression status cache configured to store compression information for blocks of memory stored within an external memory. A data cache unit is configured to request, in response to a cache miss, compressed data from the external memory based on compression information stored in the compression status bit cache. The compression status for active buffers is dynamically swapped into the compression status cache as needed. Different compression formats may be specified for one or more tiles within an active buffer. One advantage of the disclosed compression status cache is that a lame amount of attached memory may be allocated as compressible memory blocks, without incurring a corresponding die area cost because a portion of the compression status stored off chip in attached memory is cached in the compression status cache.
摘要:
A system and method for buffering intermediate data in a processing pipeline architecture stores the intermediate data in a shared cache that is coupled between one or more pipeline processing units and an external memory. The shared cache provides storage that is used by multiple pipeline processing units. The storage capacity of the shared cache is dynamically allocated to the different pipeline processing units as needed, to avoid stalling the upstream units, thereby improving overall system throughput.
摘要:
A system and method for converting data from one format to another in a processing pipeline architecture. Data is stored in a shared cache that is coupled between one or more clients and an external memory. The shared cache provides storage that is used by multiple clients rather than being dedicated to separately convert the data format for each client. Each client may interface with the memory using a different format, such as a compressed data format. Data is converted to the format expected by the particular client as it is read from the cache and output to the client during a read operation. Bytes of a cache line may be remapped to bytes of an unpack register for output to a naïve client, which may be configured to perform texture mapping operations. Data is converted from the client format to the memory format as it is stored into the cache during a write operation.