Abstract:
A data processing apparatus for forming a portion of a coherent cache system comprises at least one master device for performing data processing operations, and a cache coupled to the at least one master device and arranged to store data values for access by that at least one master device when performing the data processing operations. Cache coherency circuitry is responsive to a coherency request from another portion of the coherent cache system to cause a coherency action to be taken in respect of at least one data value stored in the cache. Responsive to an indication that the coherency action has resulted in invalidation of that at least one data value in the cache, refetch control circuitry is used to initiate a refetch of that at least one data value into the cache. Such a mechanism causes the refetch of data into the cache to be triggered by the coherency action performed in response to a coherency request from another portion of the coherent cache system, rather than relying on any actions taken by the at least one master device, thereby providing a very flexible and efficient mechanism for reducing cache latency in a coherent cache system.
Abstract:
A method, apparatus, and computer program product are disclosed for reducing the number of unnecessarily broadcast local requests to reduce the latency to access data from remote nodes in an SMP computer system. A shared invalid cache coherency protocol state is defined that predicts whether a memory read request to read data in a shared cache line can be satisfied within a local node. When a cache line is in the shared invalid state, a valid copy of the data is predicted to be located in the local node. When a cache line is in the invalid state and not in the shared invalid state, a valid copy of the data is predicted to be located in one of the remote nodes.Memory read requests to read data in a cache line that is not currently in the shared invalid state are broadcast first to remote nodes. Memory read requests to read data in a cache line that is currently in the shared invalid state are broadcast first to a local node, and in response to being unable to satisfy the memory read requests within the local node, the memory read requests are broadcast to the remote nodes.
Abstract:
A method and apparatus for implementing a snoop filter unit associated with a single processor in a multiprocessor system. The snoop filter unit has a plurality of ports, each port receiving snoop requests from exactly one dedicated source. Associated with each port is a snoop cache filter that processes each snoop cache request and records addresses of the most recent snoop requests for exactly one source. The snoop cache filter uses vector encoding to record the occurrence of snoop requests for a sequence of consecutive cache lines. All addresses of snoop requests are added to the snoop cache unless a received snoop cache request matches an entry present in the associated snoop cache, in which case the snoop request is discarded. Otherwise, the associated snoop cache request is enqueued for forwarding to the single processor. Information from all snoop cache filters assigned to all ports in the snoop filter unit are removed in the case that data corresponding to any one of the memory addresses contained in snoop cache filter is loaded in the cache hierarchy of the processor the snoop cache filter is assigned to.
Abstract:
A cache control apparatus determines whether to adopt or not data acquired by a speculative fetch by monitoring a status of the speculative fetch which is a memory fetch request output before it becomes clear whether data requested by a CPU is stored in a cache of the CPU and time period obtained by adding up the time period from when the speculative fetch is output to when the speculative fetch reaches a memory controller and time period from completion of writing of data to a memory which is specified by a data write command that has been issued, before issuance of the speculative fetch, for the same address as that for which the speculative fetch is issued to when a response of the data write command is returned.
Abstract:
In at least one embodiment, a processor detects during execution of program code whether a load instruction within the program code is associated with a hint. In response to detecting that the load instruction is not associated with a hint, the processor retrieves a full cache line of data from the memory hierarchy into the processor in response to the load instruction. In response to detecting that the load instruction is associated with a hint, a processor retrieves a partial cache line of data into the processor from the memory hierarchy in response to the load instruction.
Abstract:
A system and method for handling speculative read requests for a memory controller in a computer system are provided. In one example, a method includes the steps of providing a speculative read threshold corresponding to a selected percentage of the total number of reads that can be speculatively issued, and intermixing demand reads and speculative reads in accordance with the speculative read threshold. In another example, a computer system includes a CPU, a memory controller, memory, a bus connecting the CPU, memory controller and memory, circuitry for providing a speculative read threshold corresponding to a selected percentage of the total number of reads that can be speculatively issued, and circuitry for intermixing demand reads and speculative reads in accordance with the speculative read threshold. In another example, a method includes the steps of providing a speculative dispatch time threshold corresponding to a selected percentage of a period of time required to search a cache of the computer system, and intermixing demand reads and speculative reads in accordance with the speculative dispatch time threshold.
Abstract:
A cache coherence manager, disposed in a multi-core microprocessor, includes a request unit, an intervention unit, a response unit and an interface unit. The request unit receives coherent requests and selectively issues speculative requests in response. The interface unit selectively forwards the speculative requests to a memory. The interface unit includes at least three tables. Each entry in the first table represents an index to the second table. Each entry in the second table represents an index to the third table. The entry in the first table is allocated when a response to an associated intervention message is stored in the first table but before the speculative request is received by the interface unit. The entry in the second table is allocated when the speculative request is stored in the interface unit. The entry in the third table is allocated when the speculative request is issued to the memory.
Abstract:
In a computer system with a memory hierarchy, when a high-level cache supplies a data copy to a low-level cache, the shared copy can be either volatile or non-volatile. When the data copy is later replaced from the low-level cache, if the data copy is non-volatile, it needs to be written back to the high-level cache; otherwise it can be simply flushed from the low-level cache. The high-level cache can employ a volatile-prediction mechanism that adaptively determines whether a volatile copy or a non-volatile copy should be supplied when the high-level cache needs to send data to the low-level cache. An exemplary volatile-prediction mechanism suggests use of a non-volatile copy if the cache line has been accessed consecutively by the low-level cache. Further, the low-level cache can employ a volatile-promotion mechanism that adaptively changes a data copy from volatile to non-volatile according to some promotion policy, or changes a data copy from non-volatile to volatile according to some demotion policy.
Abstract:
A method and apparatus for supporting cache coherency in a multiprocessor computing environment having multiple processing units, each processing unit having one or more local cache memories associated and operatively connected therewith. The method comprises providing a snoop filter device associated with each processing unit, each snoop filter device having a plurality of dedicated input ports for receiving snoop requests from dedicated memory writing sources in the multiprocessor computing environment. Each snoop filter device includes a plurality of parallel operating port snoop filters in correspondence with the plurality of dedicated input ports, each port snoop filter implementing one or more parallel operating sub-filter elements that are adapted to concurrently filter snoop requests received from respective dedicated memory writing sources and forward a subset of those requests to its associated processing unit.
Abstract:
Embodiments of the invention are generally directed to systems, methods, and apparatuses for linear to physical address translation with support for page attributes. In some embodiments, a system receives an instruction to translate a memory pointer to a physical memory address for a memory location. The system may return the physical memory address and one or more page attributes. Other embodiments are described and claimed.