摘要:
A method and apparatus for optimizing the performance of a multiple cache system computer having separate caches for data and instructions in which all writes to the data cache are monitored. If the address tag of the item being written matches one of a list of tags representing valid instructions currently stored in the instruction cache, a flag called I.sub.-- FLUSH.sub.-- ON.sub.-- REI is set. Until this flag is set, REI (Return from Exception or Interrupt) instructions will not flush the instruction cache. When the flag is set, an REI command will also flush or clear the instruction cache. Thus, the instruction cache is only flushed when an address referenced by an instruction is modified, so as to reduce the number of times the cache is flushed and optimize the computer's speed of operation.
摘要:
During the operation of a computer system whose processor is supported by virtual cache memory, the cache must be cleared and refilled to allow the replacement of old data with more current data. The cache is filled with either P or N (N>P) blocks of data. Numerous methods for dynamically selecting N or P blocks of data are possible. For instance, immediately after the cache has been flushed, the miss is refilled with N blocks, moving data to the cache at high speed. Once the cache is mostly full, the miss tends to be refilled with P blocks. This maintains the currency of the data in the cache, while simultaneously avoiding writing-over of data already in the cache. The invention is useful in a multi-user/multi-tasking system where the program being run changes frequently, necessitating flushing and clearing the cache frequently.
摘要:
The invention is directed to a cache memory system in a data processor including a virtual cache memory, a physical cache memory, a virtual to physical translation buffer, a physical to virtual backmap, an Old-PA pointer and a lockout register. The backmap implements invalidates by clearing the valid flags in virtual cache memory. The Old-PA pointer indicates the backmap entry to be invalidated after a reference misses in the virtual cache. The physical address for data written to virtual cache memory is entered to Old-PA pointer by the translation buffer. The lockout register arrests all references to data which may have synonyms in virtual cache memory. The backmap is also used to invalidate any synonyms.
摘要:
Method and apparatus to efficiently maintain cache coherency by reading/writing a domain state field associated with a tag entry within a cache tag directory. A value may be assigned to a domain state field of a tag entry in a cache tag directory. The cache tag directory may belong to a hierarchy of cache tag directories. Each tag entry may be associated with a cache line from a cache belonging to a first domain. The first domain may contain multiple caches. The value of the domain state field may indicate whether its associated cache line can be read or changed.
摘要:
A multi-core processing apparatus may provide a cache probe and data retrieval method. The method may comprise sending a memory request from a requester to a record keeping structure. The memory request may have a memory address of a memory that stores requested data. The method may further comprise determining that a local last accessor of the memory address may have a copy of the requested data up to date with the memory. The local last accessor may be within a local domain that the requester belongs to. The method may further comprise sending a cache probe to the local last accessor and retrieving a latest value of the requested data from the local last accessor to the requester.
摘要:
An apparatus to resolve cache coherency is presented. In one embodiment, the apparatus includes a microprocessor comprising one or more processing cores. The apparatus also includes a probe speculative address file unit, coupled to a cache memory, comprising a plurality of entries. Each entry includes a timer and a tag associated with a memory line. The apparatus further includes control logic to determine whether to service an incoming probe based at least in part on a timer value.
摘要:
An apparatus and method is described herein for intelligently spilling cache lines. Usefulness of cache lines previously spilled from a source cache is learned, such that later evictions of useful cache lines from a source cache are intelligently selected for spill. Furthermore, another learning mechanism—cache spill prediction—may be implemented separately or in conjunction with usefulness prediction. The cache spill prediction is capable of learning the effectiveness of remote caches at holding spilled cache lines for the source cache. As a result, cache lines are capable of being intelligently selected for spill and intelligently distributed among remote caches based on the effectiveness of each remote cache in holding spilled cache lines for the source cache.
摘要:
Multi-processor systems and methods are provided. One embodiment relates to a multi-processor system that may comprise a processor having a processor pipeline that executes program instructions across at least one memory barrier with data from speculative data fills that are provided in response to source requests, and a log that retains executed load instruction entries associated with executed program instruction. The executed load instruction entries may be retired if a cache line associated with data of the speculative data fill has not been invalidated in an epoch that is different from the epoch in which the executed load instruction is executed.
摘要:
A technique selectively imposes inter-reference ordering between memory reference operations issued by a processor of a multiprocessor system to addresses within a page pertaining to a page table entry (PTE) that is affected by a translation buffer (TB) miss flow routine. The TB miss flow is used to retrieve information contained in the PTE for mapping a virtual address to a physical address and, subsequently, to allow retrieval of data at the mapped physical address. The PTE that is retrieved in response to a memory reference (read) operation is not loaded into the TB until a commit-signal associated with that read operation is returned to the processor. Once the PTE and associated commit-signal are returned, the processor loads the PTE into the TB so that it can be used for a subsequent read operation directed to the data at the physical address.
摘要:
A multi-node computer network includes a plurality of nodes coupled together via a data link. Each of the nodes includes a local memory, which further comprises a shared memory. Certain items of data that are to be shared by the nodes are stored in the shared portion of memory. Associated with each of the shared data items is a data structure. When a node sharing data with other nodes in the system seeks to modify the data, it transmits the modifications over the data link to the other nodes in the network. Each update is received in order by each node in the cluster. As part of the last transmission by the modifying node, an acknowledgement request is sent to the receiving nodes in the cluster. Each node that receives the acknowledgment request returns an acknowledgement to the sending node. The returned acknowledgement is written to the data structure associated with the shared data item. If there is an error during the transmission of the message, the receiving node does not transmit an acknowledgement, and the sending node is thereby notified that an error has occurred.