摘要:
Systems, apparatuses, and methods for dynamically partitioning a memory cache among a plurality of agents are described. A system includes a plurality of agents, a communication fabric, a memory cache, and a lower-level memory. The partitioning of the memory cache for the active data streams of the agents is dynamically adjusted to reduce memory bandwidth and increase power savings across a wide range of applications. A memory cache driver monitors activations and characteristics of the data streams of the system. When a change is detected, the memory cache driver dynamically updates the memory cache allocation policy and quotas for the agents. The quotas specify how much of the memory cache each agent is allowed to use. The updates are communicated to the memory cache controller to enforce the new policy and enforce the new quotas for the various agents accessing the memory.
摘要:
Systems, apparatuses, and methods for efficiently allocating data in a cache are described. In various embodiments, a processor decodes an indication in a software application identifying a temporal data set. The data set is flagged with a data set identifier (DSID) indicating temporal data to drop after consumption. When the data set is allocated in a cache, the data set is stored with a non-replaceable attribute to prevent a cache replacement policy from evicting the data set before it is dropped. A drop command with an indication of the DSID of the data set is later issued after the data set is read (consumed). A copy of the data set is not written back to the lower-level memory although the data set is removed from the cache. An interrupt is generated to notify firmware or other software of the completion of the drop command.
摘要:
Systems, apparatuses, and methods for performing coherence processing and memory cache processing in parallel are disclosed. A system includes a communication fabric and a plurality of dual-processing pipelines. Each dual-processing pipeline includes a coherence processing pipeline and a memory cache processing pipeline. The communication fabric forwards a transaction to a given dual-processing pipeline, with the communication fabric selecting the given dual-processing pipeline, from the plurality of dual-processing pipelines, based on a hash of the address of the transaction. The given dual-processing pipeline performs a duplicate tag lookup in parallel with a memory cache tag lookup for the transaction. By performing the duplicate tag lookup and the memory cache tag lookup in a parallel fashion rather than in a serial fashion, latency and power consumption are reduced while performance is enhanced.
摘要:
Systems, apparatuses, and methods for efficiently allocating data in a cache are described. In various embodiments, a processor decodes an indication in a software application identifying a temporal data set. The data set is flagged with a data set identifier (DSID) indicating temporal data to drop after consumption. When the data set is allocated in a cache, the data set is stored with a non-replaceable attribute to prevent a cache replacement policy from evicting the data set before it is dropped. A drop command with an indication of the DSID of the data set is later issued after the data set is read (consumed). A copy of the data set is not written back to the lower-level memory although the data set is removed from the cache. An interrupt is generated to notify firmware or other software of the completion of the drop command.
摘要:
Systems, apparatuses, and methods for efficiently allocating data in a cache are described. In various embodiments, a processor decodes an indication in a software application identifying a temporal data set. The data set is flagged with a data set identifier (DSID) indicating temporal data to drop after consumption. When the data set is allocated in a cache, the data set is stored with a non-replaceable attribute to prevent a cache replacement policy from evicting the data set before it is dropped. A drop command with an indication of the DSID of the data set is later issued after the data set is read (consumed). A copy of the data set is not written back to the lower-level memory although the data set is removed from the cache. An interrupt is generated to notify firmware or other software of the completion of the drop command.
摘要:
Systems, apparatuses, and methods for dynamically partitioning a memory cache among a plurality of agents are described. A system includes a plurality of agents, a communication fabric, a memory cache, and a lower-level memory. The partitioning of the memory cache for the active data streams of the agents is dynamically adjusted to reduce memory bandwidth and increase power savings across a wide range of applications. A memory cache driver monitors activations and characteristics of the data streams of the system. When a change is detected, the memory cache driver dynamically updates the memory cache allocation policy and quotas for the agents. The quotas specify how much of the memory cache each agent is allowed to use. The updates are communicated to the memory cache controller to enforce the new policy and enforce the new quotas for the various agents accessing the memory.
摘要:
An apparatus for processing cache requests in a computing system is disclosed. The apparatus may include a single-port memory, a dual-port memory, and a control circuit. The single-port memory may be store tag information associated with a cache memory, and the dual-port memory may be configured to store state information associated with the cache memory. The control circuit may be configured to receive a request which includes a tag address, access the tag and state information stored in the single-port memory and the dual-port memory, respectively, dependent upon the received tag address. A determination of if the data associated with the received tag address is contained in the cache memory may be made the control circuit, and the control circuit may update and store state information in the dual-port memory responsive to the determination.
摘要:
A coherence system includes a storage array that may store duplicate tag information associated with a cache memory of a processor. The system may also include a pipeline unit that includes a number of stages to control accesses to the storage array. The pipeline unit may pass through the pipeline stages, without generating an access to the storage array, an input/output (I/O) request that is received on a fabric. The system may also include a debug engine that may reformat the I/O request from the pipeline unit into a debug request. The debug engine may send the debug request to the pipeline unit via a debug bus. In response to receiving the debug request, the pipeline unit may access the storage array. The debug engine may return to the source of the I/O request via the fabric bus, a result of the access to the storage array.
摘要:
A mechanism for cache quota control is disclosed. A cache memory is configured to receive access requests from a plurality of agents, wherein a given request from a given agent of the plurality of agents specifies an identification value associated with the given agent of the plurality of agents. A cache controller is coupled to the cache memory, and is configured to store indications of current allocations of the cache memory to individual ones of the plurality of agents. The cache controller is further configured to track requests to the cache memory based on identification values specified in the requests and determine whether to update allocations of the cache memory to the individual ones of the plurality of agents based on the tracked requests.
摘要:
Systems, apparatuses, and methods for efficiently allocating data in a cache are described. In various embodiments, a processor decodes an indication in a software application identifying a temporal data set. The data set is flagged with a data set identifier (DSID) indicating temporal data to drop after consumption. When the data set is allocated in a cache, the data set is stored with a non-replaceable attribute to prevent a cache replacement policy from evicting the data set before it is dropped. A drop command with an indication of the DSID of the data set is later issued after the data set is read (consumed). A copy of the data set is not written back to the lower-level memory although the data set is removed from the cache. An interrupt is generated to notify firmware or other software of the completion of the drop command.