摘要:
A data processing system includes a processor core and a memory subsystem. The memory subsystem includes a store queue having a plurality of entries, where each entry includes an address field for holding the target address of store operation, a data field for holding data for the store operation, and a virtual sync field indicating a presence or absence of a synchronizing operation associated with the entry. The memory subsystem further includes a store queue controller that, responsive to receipt at the memory subsystem of a sequence of operations including a synchronizing operation and a particular store operation, places a target address and data of the particular store operation within the address field and data field, respectively, of an entry in the store queue and sets the virtual sync field of the entry to represent the synchronizing operation, such that a number of store queue entries utilized is reduced.
摘要:
A data processing system includes a memory subsystem and an execution unit, coupled to the memory subsystem, which executes store instructions to determine target memory addresses of store operations to be performed by the memory subsystem. The data processing system further includes a mode field having a first setting indicating strong ordering between store operations and a second setting indicating weak ordering between store operations. Store operations accessing the memory subsystem are associated with either the first setting or the second setting. The data processing system also includes logic that, based upon settings of the mode field, inserts a synchronizing operation between a store operation associated with the first setting and a store operation associated with the second setting, such that all store operations preceding the synchronizing operation complete before store operations subsequent to the synchronizing operation.
摘要:
A method and processor system that substantially eliminates data bus operations when completing updates of an entire cache line with a full store queue entry. The store queue within a processor chip is designed with a series of AND gates connecting individual bits of the byte enable bits of a corresponding entry. The AND output is fed to the STQ controller and signals when the entry is full. When full entries are selected for dispatch to the RC machines, the RC machine is signaled that the entry updates the entire cache line. The RC machine obtains write permission to the line, and then the RC machine overwrites the entire cache line. Because the entire cache line is overwritten, the data of the cache line is not retrieved when the request for the cache line misses at the cache or when data goes state before write permission is obtained by the RC machine.
摘要:
A data processing system includes at least first and second coherency domains, each including at least one processor core and a memory. In response to an initialization operation by a processor core that indicates a target memory block to be initialized, a cache memory in the first coherency domain determines a coherency state of the target memory block with respect to the cache memory. In response to the determination, the cache memory selects a scope of broadcast of an initialization request identifying the target memory block. A narrower scope including the first coherency domain and excluding the second coherency domain is selected in response to a determination of a first coherency state, and a broader scope including the first coherency domain and the second coherency domain is selected in response to a determination of a second coherency state. The cache memory then broadcasts an initialization request with the selected scope. In response to the initialization request, the target memory block is initialized within a memory of the data processing system to an initialization value.
摘要:
A data processing system includes a first plane including a first plurality of processing nodes, each including multiple processing units, and a second plane including a second plurality of processing nodes, each including multiple processing units. The data processing system also includes a plurality of point-to-point first tier links. Each of the first plurality and second plurality of processing nodes includes one or more first tier links among the plurality of first tier links, where the first tier link(s) within each processing node connect a pair of processing units in the same processing node for communication. The data processing system further includes a plurality of point-to-point second tier links. At least a first of the plurality of second tier links connects processing units in different ones of the first plurality of processing nodes, at least a second of the plurality of second tier links connects processing units in different ones of the second plurality of processing nodes, and at least a third of the plurality of second tier links connects a processing unit in the first plane to a processing unit in the second plane.
摘要:
A method and processor chip design for enabling a processor core to continue sending store operations speculatively to the store queue after the core receives indication that the store queue is full. The processor core is configured with speculative store logic that enables the processor core to continue issuing store operations while the store queue full signal is asserted. A copy of the speculatively issued store operation is placed within a speculative store buffer. The core waits for a signal from the store queue indicating the store operation was accepted into the store queue. When the speculatively-issued store operation is accepted within the store queue, the copy is discarded from the buffer. However, when the store operation is rejected, the speculative store logic re-issues the store operation ahead of normal store operations.
摘要:
A cache, system and method for reducing the number of rejected snoop requests. An incoming snoop request is entered in the first available latch in a pipeline of latches in a stall/reorder unit if the stall/reorder unit is not full. The entered snoop request is dispatched to a selector upon entering a bottom latch in the pipeline. The stall/reorder unit is not informed as to whether the dispatched snoop request is accepted by an arbitration mechanism for several clock cycles after the dispatch occurred. A copy of the dispatched snoop request is stored in a top latch in an overrun pipeline of latches in the first unit upon dispatching the snoop request. By maintaining information about the snoop request, the snoop request may be dispatched again to the selector in case the dispatched snoop request was rejected thereby increasing the chance that the snoop request will ultimately be accepted.
摘要:
A system and method for re-ordering store operations from a processor core to a store queue. When a store queue receives a new processor-issued store operation from the processor core, a store queue controller allocates a new entry in the store queue. In response to allocating the new entry in the store queue, the store queue controller determines whether or not the new entry is dependent on at least one other valid entry in the store queue. In response to determining the new entry is dependent on at least one other valid entry in the store queue, the store queue controller inhibits requesting of the new entry to the RC dispatch logic until each valid entry on which the new entry is dependent has been successfully dispatched to an RC machine by the RC dispatch logic.
摘要:
A cache, system and method for reducing the number of rejected snoop requests. A “stall/reorder unit” in a cache receives a snoop request from an interconnect. The snoop request is entered in the first available latch of the stall/reorder unit unless the stall/reorder unit is full in which case the new snoop request is transmitted to a second unit configured to transmit a request to retry resending the new snoop request. Snoop requests have a higher priority than requests from processors and snoop requests are selected by the arbitration mechanism over processor requests unless the arbitration mechanism requests otherwise (“stall request”) to the stall/reorder unit. By snoop requests having a higher priority than processor requests, the number of snoop requests rejected is reduced. By having the arbitration mechanism issue a stall request, the processor will not be starved.
摘要:
A method for informing a processor of a selected order of transmission of data to the processor. The method comprises the steps of coupling system components via a data bus to the processor to effectuate data transfer, determining at the system component logic the order in which to transmit data to the processor, and issuing to the data bus a selected order bit concurrent with the data, wherein the selected order bit alerts the processor of the order and the data is transmitted in that order. In a preferred embodiment, the system component is the cache and the method may involve receiving at the cache a preference of ordering for a read address/request from the processor. The preference order logic of the cache controller or a preference order logic component evaluates the preference of ordering desired by comparing the processor preference with other preferences, including cache order preference. One preference order is selected and the data is then retrieved from a cache line of the cache in the order selected.