摘要:
In a multiprocessor system with at least two levels of cache, a speculative thread may run on a core processor in parallel with other threads. When the thread seeks to do a write to main memory, this access is to be written through the first level cache to the second level cache. After the write though, the corresponding line is deleted from the first level cache and/or prefetch unit, so that any further accesses to the same location in main memory have to be retrieved from the second level cache. The second level cache keeps track of multiple versions of data, where more than one speculative thread is running in parallel, while the first level cache does not have any of the versions during speculation. A switch allows choosing between modes of operation of a speculation blind first level cache.
摘要:
An apparatus for processing coherency transactions in a computing system is disclosed. The apparatus may include a request queue circuit, a duplicate tag circuit, and a memory interface unit. The request queue circuit may be configured to generate a speculative read request dependent upon a received read transaction. The duplicate tag circuit may be configured to store copies of tag from one or more cache memories, and to generate a kill message in response to a determination that data requested in the received read transaction is stored in a cache memory. The memory interface unit may be configured to store the generated speculative read request dependent upon a stall condition. The stored speculative read request may be sent to a memory controller dependent upon the stall condition. The memory interface unit may be further configured to delete the speculative read request in response to the kill message.
摘要:
Embodiments of the invention are generally directed to systems, methods, and apparatuses for linear to physical address translation with support for page attributes. In some embodiments, a system receives an instruction to translate a memory pointer to a physical memory address for a memory location. The system may return the physical memory address and one or more page attributes. Other embodiments are described and claimed.
摘要:
A computer system and a method are provided that reduce the amount of time and computing resources that are required to perform a hardware table walk (HWTW) in the event that a translation lookaside buffer (TLB) miss occurs. If a TLB miss occurs when performing a stage 2 (S2) HWTW to find the PA at which a stage 1 (S1) page table is stored, the MMU uses the IPA to predict the corresponding PA, thereby avoiding the need to perform any of the S2 table lookups. This greatly reduces the number of lookups that need to be performed when performing these types of HWTW read transactions, which greatly reduces processing overhead and performance penalties associated with performing these types of transactions.
摘要:
A method and apparatus for supporting cache coherency in a multiprocessor computing environment having multiple processing units, each processing unit having one or more local cache memories associated and operatively connected therewith. The method comprises providing a snoop filter device associated with each processing unit, each snoop filter device having a plurality of dedicated input ports for receiving snoop requests from dedicated memory writing sources in the multiprocessor computing environment. Each snoop filter device includes a plurality of parallel operating port snoop filters in correspondence with the plurality of dedicated input ports, each port snoop filter implementing one or more parallel operating sub-filter elements that are adapted to concurrently filter snoop requests received from respective dedicated memory writing sources and forward a subset of those requests to its associated processing unit.
摘要:
A method, system, and computer program product for data transfer to memory over an input/output (I/O) interconnect are provided. The method includes reading a mailbox stored on an I/O adapter in response to a request to initiate an I/O transaction. The mailbox stores a directive that defines a condition under which cache injection for data values in the I/O transaction will not be performed. The method also includes embedding a hint into the I/O transaction when the directive in the mailbox matches data received in the request, and executing the I/O transaction. The execution of the I/O transaction causes a system chipset or I/O hub for a processor receiving the I/O transaction, to directly store the data values from the I/O transaction into system memory and to suppress the cache injection of the data values into a cache memory upon presence of the hint in a header of the I/O transaction.
摘要:
An apparatus to resolve cache coherency is presented. In one embodiment, the apparatus includes a microprocessor comprising one or more processing cores. The apparatus also includes a probe speculative address file unit, coupled to a cache memory, comprising a plurality of entries. Each entry includes a timer and a tag associated with a memory line. The apparatus further includes control logic to determine whether to service an incoming probe based at least in part on a timer value.
摘要:
A data processing apparatus for forming a portion of a coherent cache system comprises at least one master device for performing data processing operations, and a cache coupled to the at least one master device and arranged to store data values for access by that at least one master device when performing the data processing operations. Cache coherency circuitry is responsive to a coherency request from another portion of the coherent cache system to cause a coherency action to be taken in respect of at least one data value stored in the cache. Responsive to an indication that the coherency action has resulted in invalidation of that at least one data value in the cache, refetch control circuitry is used to initiate a refetch of that at least one data value into the cache. Such a mechanism causes the refetch of data into the cache to be triggered by the coherency action performed in response to a coherency request from another portion of the coherent cache system, rather than relying on any actions taken by the at least one master device, thereby providing a very flexible and efficient mechanism for reducing cache latency in a coherent cache system.
摘要:
In a computer system with a memory hierarchy, when a high-level cache supplies a data copy to a low-level cache, the shared copy can be either volatile or non-volatile. When the data copy is later replaced from the low-level cache, if the data copy is non-volatile, it needs to be written back to the high-level cache; otherwise it can be simply flushed from the low-level cache. The high-level cache can employ a volatile-prediction mechanism that adaptively determines whether a volatile copy or a non-volatile copy should be supplied when the high-level cache needs to send data to the low-level cache. An exemplary volatile-prediction mechanism suggests use of a non-volatile copy if the cache line has been accessed consecutively by the low-level cache. Further, the low-level cache can employ a volatile-promotion mechanism that adaptively changes a data copy from volatile to non-volatile according to some promotion policy, or changes a data copy from non-volatile to volatile according to some demotion policy.
摘要:
A processing unit for a data processing system includes a processor core having one or more execution units for processing instructions and a register file for storing data accessed in processing of the instructions. The processing unit also includes a multi-level cache hierarchy coupled to and supporting the processor core. The multi-level cache hierarchy includes at least one upper level of cache memory having a lower access latency and at least one lower level of cache memory having a higher access latency. The lower level of cache memory, responsive to receipt of a memory access request that hits only a partial cache line in the lower level cache memory, sources the partial cache line to the at least one upper level cache memory to service the memory access request. The at least one upper level cache memory services the memory access request without caching the partial cache line.