摘要:
Disclosed herein are a data processing system and method of operating a data processing system that arbitrate between conflicting requests to modify data cached in a shared state and that protect ownership of the cache line granted during such arbitration until modification of the data is complete. The data processing system includes a plurality of agents coupled to an interconnect that supports pipelined transactions. While data associated with a target address are cached at a first agent among the plurality of agents in a shared state, the first agent issues a transaction on the interconnect. In response to snooping the transaction, a second agent provides a snoop response indicating that the second agent has a pending conflicting request and a coherency decision point provides a snoop response granting the first agent ownership of the data. In response to the snoop responses, the first agent is provided with a combined response representing a collective response to the transaction of all of the agents that grants the first agent ownership of the data. In response to the combined response, the first agent is permitted to modify the data.
摘要:
A method of operation within a processor that permits load instructions following barrier instructions in an instruction sequence to be issued speculatively. The barrier instruction is executed and while the barrier operation is pending, a load request associated with the load instruction is speculatively issued. A speculation flag is set to indicate the load instruction was speculatively issued. The flag is reset when an acknowledgment of the barrier operation is received. Data that is returned before the acknowledgment is received is temporarily held, and the data is forwarded to the register and/or execution unit of the processor only after the acknowledgment is received. If a snoop invalidate is detected for the speculatively issued load request before the barrier operation completes, the data is discarded and the load request is re-issued.
摘要:
Disclosed is a method of operation within a processor that permits load instructions to be issued speculatively. An instruction sequence is received that includes multiple barrier instructions and a load instruction that follows the barrier instructions in the instruction sequence. In response to the multiple barrier instructions, barrier operations are issued on an interconnect coupled to the processor. Also, while the barrier operations are pending, a load request associated with the load instruction is speculatively issued. When the load request is issued, a flag is set to indicate that it was speculatively issued. The flag is reset when acknowledgments of all the barrier operations are received. Data that is returned before the acknowledgments are received is temporarily held and forwarded to the register and/or execution unit of the processor only after the acknowledgments are received. If a snoop invalidate is detected for the speculatively issued load request before completion of the barrier operations, the data is discarded and the load request is re-issued.
摘要:
A method of handling a write operation in a multiprocessor computer system wherein each processing unit has a respective cache, by determining that a new value for a store instruction is the same as a current value already contained in the memory hierarchy, and discarding the store instruction without issuing any associated cache operation in response to this determination. When a store hit occurs, the current value is retrieved from the local cache. When a store miss occurs, the current value is retrieved from a remote cache by issuing a read request. The comparison may be performed using a portion of the cache line which is less than a granule size of the cache line. A store gathering queue can be use to collect pending store instructions that are directed to different portions of the same cache line.
摘要:
A method of handling a write operation in a multiprocessor computer system wherein each processing unit has a respective cache, by determining that a new value for a store instruction is the same as a current value already contained in the memory hierarchy, and discarding the store instruction without issuing any associated cache operation in response to this determination. When a store hit occurs, the current value is retrieved from the local cache. When a store miss occurs, the current value is retrieved from a remote cache by issuing a read request. The comparison may be performed using a portion of the cache line which is less than a granule size of the cache line. A store gathering queue can be use to collect pending store instructions that are directed to different portions of the same cache line.
摘要:
A method of maintaining coherency in a cache hierarchy of a processing unit of a computer system, wherein the upper level (L1) cache includes a split instruction/data cache. In one implementation, the L1 data cache is store-through, and each processing unit has a lower level (L2) cache. When the lower level cache receives a cache operation requiring invalidation of a program instruction in the L1 instruction cache (i.e., a store operation or a snooped kill), the L2 cache sends an invalidation transaction (e.g., icbi) to the instruction cache. The L2 cache is fully inclusive of both instructions and data. In another implementation, the L1 data cache is write-back, and a store address queue in the processor core is used to continually propagate pipelined address sequences to the lower levels of the memory hierarchy, i.e., to an L2 cache or, if there is no L2 cache, then to the system bus. If there is no L2 cache, then the cache operations may be snooped directly against the L1 instruction cache.
摘要:
Following a cache miss by an operation, the address for the operation is transmitted on the bus coupling the cache to lower levels of the storage hierarchy. A portion of the address including the index field is transmitted during a first bus cycle, and may be employed to begin directory lookups in lower level storage devices before the address tag is received. The remainder of the address is transmitted during subsequent bus cycles, which should be in time for address tag comparisons with the congruence class elements. To allow multiple directory lookups to be occurring concurrently in a pipelined directory, a portion of multiple addresses for several data access operations, each portion including the index field for the respective address, may be transmitted during the first bus cycle or staged in consecutive bus cycles, with the remainders of each address—including the cache tags—transmitted during the subsequent bus cycles. This allows directory lookups utilizing the index fields to be processed concurrently within a lower level storage device for multiple operations, with the address tags being provided later, but still timely for tag comparisons at the end of the directory lookup. Where the lower level storage device operates at a higher frequency than the bus, overall latency is reduced and directory bandwidth is more efficiently utilized.
摘要:
A method and apparatus for efficiently managing caches with non-power-of-two congruence classes allows for increasing the number of congruence classes in a cache when not enough area is available to double the cache size. One or more congruence classes within the cache have their associative sets split so that a number of congruence classes are created with reduced associativity. The management method and apparatus allow access to the congruence classes without introducing any additional cycles of delay or complex logic.
摘要:
A method of improving memory access for a computer system, by sending load requests to a lower level storage subsystem along with associated information pertaining to intended use of the requested information by the requesting processor, without using a high level load queue. Returning the requested information to the processor along with the associated use information allows the information to be placed immediately without using reload buffers. A register load bus separate from the cache load bus (and having a smaller granularity) is used to return the information. An upper level (L1) cache may then be imprecisely reloaded (the upper level cache can also be imprecisely reloaded with store instructions). The lower level (L2) cache can monitor L1 and L2 cache activity, which can be used to select a victim cache block in the L1 cache (based on the additional L2 information), or to select a victim cache block in the L2 cache (based on the additional L1 information). L2 control of the L1 directory also allows certain snoop requests to be resolved without waiting for L1 acknowledgement. The invention can be applied to, e.g., instruction, operand data and translation caches.
摘要:
A method of improving memory access for a computer system, by sending load requests to a lower level storage subsystem along with associated information pertaining to intended use of the requested information by the requesting processor, without using a high level load queue. Returning the requested information to the processor along with the associated use information allows the information to be placed immediately without using reload buffers. A register load bus separate from the cache load bus (and having a smaller granularity) is used to return the information. An upper level (L1) cache may then be imprecisely reloaded (the upper level cache can also be imprecisely reloaded with store instructions). The lower level (L2) cache can monitor L1 and L2 cache activity, which can be used to select a victim cache block in the L1 cache (based on the additional L2 information), or to select a victim cache block in the L2 cache (based on the additional L1 information). L2 control of the L1 directory also allows certain snoop requests to be resolved without waiting for L1 acknowledgement. The invention can be applied to, e.g., instruction, operand data and translation caches.