摘要:
Methods for storing replacement data in a multi-way associative cache are disclosed. One method comprises logically dividing the cache's cache sets into segments of at least one cache way; searching a cache set in accordance with a segment search sequence for a segment currently comprising a way which has not yet been accessed during a current cycle of the segment search sequence; searching the current segment in accordance with a way search sequence for a way which has not yet been accessed during a current way search cycle; and storing the replacement data in a first way which has not yet been accessed during a current cycle of the way search sequence. A cache controller that performs such methods is also disclosed.
摘要:
A method, for executing a load locked and a store conditional instruction in a processor, achieves an atomic read-write operation to a memory block. First the load locked instruction is executed to read a memory block, and the processor in response to executing the load locked instruction issues a read modify system command to read the block and to take ownership of the block by the processor, and also sets a lock flag for the address of the memory block, and writes a value of the memory block into a cache of the processor as a cache copy of the memory block. The lock flag, upon receipt of an invalidate message by the processor for the cache copy of the memory block, is reset if any invalidate messages for the memory block are received by the processor. The processor waits for a selected time interval before the processor surrenders ownership of the memory block upon receipt of an ownership request message, if any is received by the processor after execution of the load locked instruction. The processor executes the store conditional instruction, and the processor in response to executing the store conditional instruction tests the lock flag, and if the lock flag is set, writing to the cache copy of the memory block. The processor ends, in the event that the lock flag is reset, the store conditional instruction and does not write to the cache copy of the memory block.
摘要:
A mechanism optimizes the generation of a commit-signal by control logic of the multiprocessor system in response to a memory reference operation issued by a processor to a local node of a multiprocessor system having a hierarchical switch for interconnecting a plurality of nodes. The mechanism generally comprises a structure that indicates whether the memory reference operation affects other processors of other nodes of the multiprocessor system. An ordering point of the local node generates an optimized commit-signal when the structure indicates that the memory reference operation does not affect the other processors.
摘要:
A multiprocessor computer system releases a victim data buffer storing victim data, when system control logic determines that a count of the number of probe messages pending at a specified time equals the number of such probe messages that have had an address comparison performed after the specified time. The specified time occurs when a command to write the victim data element to main memory passes a serialization point of the computer system.The address comparison compares a target address of a probe message with addresses of data stored in the victim data buffer and the associated cache of a CPU of the computer system.
摘要:
In accordance with the present invention, a method and apparatus is provided for maintaining the coherency of victim data from a time when the data is stored in a victim data buffer until a time when the data is written into a main memory. Alternatively, the coherency of the victim data is preserved until a determination is made that pending probe messages do not target the victim data. At that time the victim data buffer can be deallocated.With both arrangements, a central processing unit can release a victim data buffer at a point in time other than when the data that is stored therein is read from the buffer. Thus, the central processor unit can perform the release or deallocation of the buffer when it is most efficient and when no further access to the data is required.
摘要:
A technique reduces the latency of a memory barrier (MB) operation used to impose an inter-reference order between sets of memory reference operations issued by a processor to a multiprocessor system having a shared memory. The technique comprises issuing the MB operation immediately after issuing a first set of memory reference operations (i.e., the pre-MB operations) without waiting for responses to those pre-MB operations. Issuance of the MB operation to the system results in serialization of that operation and generation of a MB Acknowledgment (MB-Ack) command. The MB-Ack is loaded into a probe queue of the issuing processor and, according to the invention, functions to pull-in all previously ordered invalidate and probe commands in that queue. By ensuring that the probes and invalidates are ordered before the MB-Ack is received at the issuing processor, the inventive technique provides the appearance that all pre-MB references have completed.
摘要:
A technique reduces the latency of inter-reference ordering between sets of memory reference operations in a multiprocessor system having a shared memory that is distributed among a plurality of processors that share a cache. According to the technique, each processor sharing a cache inherits a commit-signal that is generated by control logic of the multiprocessor system in response to a memory reference operation issued by another processor sharing that cache. The commit-signal facilitates serialization among the processors and shared memory entities of the multiprocessor system by indicating the apparent completion of the memory reference operation to those entities of the system.
摘要:
A prediction mechanism for improving direct-mapped cache performance is shown to include a direct-mapped cache, partitioned into a plurality of pseudo-banks. Prediction means are employed to provide a prediction index which is appended to the cache index to provide the entire address for addressing the direct mapped cache. One embodiment of the prediction means includes a prediction cache which is advantageously larger than the pseudo-banks of the direct-mapped cache and is used to store the prediction index for each cache location. A second embodiment includes a plurality of partial tag stores, each including a predetermined number of tag bits for the data in each bank. A comparison of the tags generates a match in one of the plurality of tag stores, and is used in turn to generate a prediction index. A third embodiment for use with a direct mapped cache divided into two partitions includes a distinguishing bit ram, which is used to provide the bit number of any bit which differs between the tags at the same location in the different banks. The bit number is used in conjunction with a complement signal to provide the prediction index for addressing the direct-mapped cache.
摘要:
An apparatus for allocating data to and retrieving data from a cache includes a memory subsystem coupled between a processor and a memory to provide quick access of memory data to the processor. The memory subsystem includes a cache memory. The address provided to the memory subsystem is divided into a cache index and a tag, and the cache index is hashed to provide a plurality of alternative addresses for accessing the cache. During a cache read, each of the alternative addresses are selected to search for the data responsive to an indicator of the validity of the data at the locations. The selection of the alternative address may be done through a mask having a number of bits corresponding to the number of alternative addresses. Each bit indicates whether the alternative address at that location should be used during the access of the cache in search of the data. Alternatively, a memory device which has more entries than the cache has blocks may be used to store the select value of the best alternative address to use to locate the data. Data is allocated to each alternative address based upon a modified least recently used technique wherein a quantum number and modula counter are used to time stamp the data.
摘要:
A pipelined processor includes an instruction box including a register mapper, to map register operand fields of a set of instructions and an instruction scheduler, fed by the set of instructions, to reorder the issuance of the set of instructions from the instruction processor. The mapped register operand fields are associated with the corresponding instructions of the reordered set of instructions prior to issuance of the instructions. The processor further includes a branch prediction table which maps a stored pattern of past histories associated with a branch instruction to a more likely prediction direction of the branch instruction. The processor further includes a memory reference tagging store associated with the instruction scheduler so that the scheduler can reorder memory reference instructions without knowing the actual memory location addressed by the memory reference instruction.