摘要:
Techniques for handling queuing of memory accesses prevent passing excessive requests that implicate a region of memory subject to a high latency memory operation, such as a memory refresh operation, memory scrubbing or an internal bus calibration event, to a re-order queue of a memory controller. The memory controller includes a queue for storing pending memory access requests, a re-order queue for receiving the requests, and a control logic implementing a queue controller that determines if there is a collision between a received request and an ongoing high-latency memory operation. If there is a collision, then transfer of the request to the re-order queue may be rejected outright, or a count of existing queued operations that collide with the high latency operation may be used to determine if queuing the new request will exceed a threshold number of such operations.
摘要:
Techniques for handling queuing of memory accesses prevent passing excessive requests that implicate a region of memory subject to a high latency memory operation, such as a memory refresh operation, memory scrubbing or an internal bus calibration event, to a re-order queue of a memory controller. The memory controller includes a queue for storing pending memory access requests, a re-order queue for receiving the requests, and a control logic implementing a queue controller that determines if there is a collision between a received request and an ongoing high-latency memory operation. If there is a collision, then transfer of the request to the re-order queue may be rejected outright, or a count of existing queued operations that collide with the high latency operation may be used to determine if queuing the new request will exceed a threshold number of such operations.
摘要:
A method for performing refresh operations on a rank of memory devices is disclosed. After the completion of a memory operation, a determination is made whether or not a refresh backlog count value is less than a predetermined value and the rank of memory devices is being powered down. If the refresh backlog count value is less than the predetermined value and the rank of memory devices is being powered down, an Idle Count threshold value is set to a maximum value such that a refresh operation will be performed after a maximum delay time. If the refresh backlog count value is not less than the predetermined value or the rank of memory devices is not in a powered down state, the Idle Count threshold value is set based on the slope of an Idle Delay Function such that a refresh operation will be performed accordingly.
摘要:
A method for performing refresh operations on a rank of memory devices is disclosed. After the completion of a memory operation, a determination is made whether or not a refresh backlog count value is less than a predetermined value and the rank of memory devices is being powered down. If the refresh backlog count value is less than the predetermined value and the rank of memory devices is being powered down, an Idle Count threshold value is set to a maximum value such that a refresh operation will be performed after a maximum delay time. If the refresh backlog count value is not less than the predetermined value or the rank of memory devices is not in a powered down state, the Idle Count threshold value is set based on the slope of an Idle Delay Function such that a refresh operation will be performed accordingly.
摘要:
A non-uniform memory access (NUMA) computer system includes a remote node coupled by a node interconnect to a home node including a home system memory. The remote node includes a plurality of snoopers coupled to a local interconnect. The plurality of snoopers includes a cache that caches a cache line corresponding to but modified with respect to data resident in the home system memory. The cache has a cache controller that issues a deallocate operation on the local interconnect in response to deallocating the modified cache line. The remote node further includes a node controller, coupled between the local interconnect and the node interconnect, that transmits the deallocate operation to the home node with an indication of whether or not a copy of the cache line remains in the remote node following the deallocation. In this manner, the local memory directory associated with the home system memory can be updated to precisely reflect which nodes hold a copy of the cache line.
摘要:
System bus snoopers within a multiprocessor system in which dynamic application sequence behavior information is maintained within cache directories append the dynamic application sequence behavior information for the target cache line to their snoop responses. The system controller, which may also maintain dynamic application sequence behavior information in a history directory, employs the available dynamic application sequence behavior information to append “hints” to the combined response, appends the concatenated dynamic application sequence behavior information to the combined response, or both. Either the hints or the dynamic application sequence behavior information may be employed by the bus master and other snoopers in cache management.
摘要:
A method of maintaining coherency in a cache hierarchy of a processing unit of a computer system, wherein the upper level (L1) cache includes a split instruction/data cache. In one implementation, the L1 data cache is store-through, and each processing unit has a lower level (L2) cache. When the lower level cache receives a cache operation requiring invalidation of a program instruction in the L1 instruction cache (i.e., a store operation or a snooped kill), the L2 cache sends an invalidation transaction (e.g., icbi) to the instruction cache. The L2 cache is fully inclusive of both instructions and data. In another implementation, the L1 data cache is write-back, and a store address queue in the processor core is used to continually propagate pipelined address sequences to the lower levels of the memory hierarchy, i.e., to an L2 cache or, if there is no L2 cache, then to the system bus. If there is no L2 cache, then the cache operations may be snooped directly against the L1 instruction cache.
摘要:
According to a first aspect of the present invention, a data processing system is provided that includes a communication network to which multiple devices are coupled. A first of the multiple devices includes a number of requestors (or queues), which are each permanently assigned a respective one of a number of unique tags. In response to a communication request by a requestor within the first device, a tag assigned to the requestor is transmitted on the communication network in conjunction with the requested communication transaction. According to a second aspect of the present invention, a data processing system includes a cache having a cache directory. A status indication indicative of the status of at least one of a plurality of data entries in the cache is stored in the cache directory. In response to receipt of a cache operation request, a determination is made whether to update the status indication. In response to the determination that the status indication is to be updated, the status indication is copied into a shadow register and updated. The status indication is then written back into the cache directory at a later time. The shadow register thus serves as a virtual cache controller queue that dynamically mimics a cache directory entry without functional latency.
摘要:
System bus masters within a multiprocessor system in which dynamic application sequence behavior information is maintained within cache directories append the historical access information for the target cache line to their requests. Snoopers and/or the system controller, which may also maintain dynamic application sequence behavior information in a history directory, employ the appended access information for cache management.
摘要:
A multiprocessor data processing system requires careful management to maintain cache coherency. Conventional systems using a MESI approach sacrifice some performance with inefficient lock-acquisition and lock-retention techniques. The disclosed system provides additional cache states, indicator bits, and lock-acquisition routines to improve cache performance. The additional cache states allow cache state transition sequences to be optimized. In particular, the claimed system and method provides that a given processor, after acquiring a lock or reservation to a given cache line, will keep the lock, to make successive modifications to the cache line, instead of releasing it to other processors after making only one modification. By doing so, the overhead typically required to acquire a lock before making any cache line modification is eliminated for successive modifications.