摘要:
A processor includes at least one instruction execution unit that executes store instructions to obtain store operations and a store queue coupled to the instruction execution unit. The store queue includes a queue entry in which the store queue gathers multiple store operations during a store gathering window to obtain a data portion of a write transaction directed to lower level memory. In addition, the store queue includes dispatch logic that varies a size of the store gathering window to optimize store performance for different store behaviors and workloads.
摘要:
A processor includes at least one instruction execution unit that executes store instructions to obtain store operations and a store queue coupled to the instruction execution unit. The store queue includes a queue entry in which the store queue gathers multiple store operations during a store gathering window to obtain a data portion of a write transaction directed to lower level memory. In addition, the store queue includes dispatch logic that varies a size of the store gathering window to optimize store performance for different store behaviors and workloads.
摘要:
A processor includes at least one instruction execution unit that executes store instructions to obtain store operations and a store queue coupled to the instruction execution unit. The store queue includes a queue entry in which the store queue gathers multiple store operations during a store gathering window to obtain a data portion of a write transaction directed to lower level memory. In addition, the store queue includes dispatch logic that varies a size of the store gathering window to optimize store performance for different store behaviors and workloads.
摘要:
A method and processor chip design for enabling a processor core to continue sending store operations speculatively to the store queue after the core receives indication that the store queue is full. The processor core is configured with speculative store logic that enables the processor core to continue issuing store operations while the store queue full signal is asserted. A copy of the speculatively issued store operation is placed within a speculative store buffer. The core waits for a signal from the store queue indicating the store operation was accepted into the store queue. When the speculatively-issued store operation is accepted within the store queue, the copy is discarded from the buffer. However, when the store operation is rejected, the speculative store logic re-issues the store operation ahead of normal store operations.
摘要:
A cache, system and method for reducing the number of rejected snoop requests. A “stall/reorder unit” in a cache receives a snoop request from an interconnect. The snoop request is entered in the first available latch of the stall/reorder unit unless the stall/reorder unit is full in which case the new snoop request is transmitted to a second unit configured to transmit a request to retry resending the new snoop request. Snoop requests have a higher priority than requests from processors and snoop requests are selected by the arbitration mechanism over processor requests unless the arbitration mechanism requests otherwise (“stall request”) to the stall/reorder unit. By snoop requests having a higher priority than processor requests, the number of snoop requests rejected is reduced. By having the arbitration mechanism issue a stall request, the processor will not be starved.
摘要:
A cache, system and method for reducing the number of rejected snoop requests. An incoming snoop request is entered in the first available latch in a pipeline of latches in a stall/reorder unit if the stall/reorder unit is not full. The entered snoop request is dispatched to a selector upon entering a bottom latch in the pipeline. The stall/reorder unit is not informed as to whether the dispatched snoop request is accepted by an arbitration mechanism for several clock cycles after the dispatch occurred. A copy of the dispatched snoop request is stored in a top latch in an overrun pipeline of latches in the first unit upon dispatching the snoop request. By maintaining information about the snoop request, the snoop request may be dispatched again to the selector in case the dispatched snoop request was rejected thereby increasing the chance that the snoop request will ultimately be accepted.
摘要:
A method, system, and processor chip design for reducing the latency between completing a LARX operation and receiving the associated STCX operation to complete the update to the cache line. Each entry of the store queue of the issuing processor is provided an additional tracking bit (priority bit). The priority bit is set whenever a STCX operation is placed within the entry. During selection of an entry for dispatch by the arbitration logic, the arbitration logic scans the value of the priority bits of each eligible entry. An entry with the priority bit set is given priority in the selection process within architectural rules. That entry is then selected for dispatch as early as is possible within the established rules.
摘要:
A method and processor system that substantially enhances the store gathering capabilities of a store queue entry to enable gathering of a maximum number of proximate-in-time store operations before the entry is selected for dispatch. A counter is provided for each entry to track a time since a last gather to the entry. When a new gather does not occur before the counter reaches a threshold saturation point, the entry is signaled ready for dispatch. By defining an optimum threshold saturation point before the counter expires, sufficient time is provided for the entry to gather a proximate-in-time store operation. The entry may be deemed eligible for selection when certain conditions occur, including the entry becoming full, issuance of a barrier operation, and saturation of the counter. The use of the counter increases the ability of a store queue entry to complete gathering of enough store operations to update an entire cache line before that entry is dispatched to an RC machine.
摘要:
A system and method for re-ordering store operations from a processor core to a store queue. When a store queue receives a new processor-issued store operation from the processor core, a store queue controller allocates a new entry in the store queue. In response to allocating the new entry in the store queue, the store queue controller determines whether or not the new entry is dependent on at least one other valid entry in the store queue. In response to determining the new entry is dependent on at least one other valid entry in the store queue, the store queue controller inhibits requesting of the new entry to the RC dispatch logic until each valid entry on which the new entry is dependent has been successfully dispatched to an RC machine by the RC dispatch logic.
摘要:
A method and processor system that substantially eliminates data bus operations when completing updates of an entire cache line with a full store queue entry. The store queue within a processor chip is designed with a series of AND gates connecting individual bits of the byte enable bits of a corresponding entry. The AND output is fed to the STQ controller and signals when the entry is full. When full entries are selected for dispatch to the RC machines, the RC machine is signaled that the entry updates the entire cache line. The RC machine obtains write permission to the line, and then the RC machine overwrites the entire cache line. Because the entire cache line is overwritten, the data of the cache line is not retrieved when the request for the cache line misses at the cache or when data goes state before write permission is obtained by the RC machine.