摘要:
A data processing system includes a processor core and a memory subsystem. The memory subsystem includes a store queue having a plurality of entries, where each entry includes an address field for holding the target address of store operation, a data field for holding data for the store operation, and a virtual sync field indicating a presence or absence of a synchronizing operation associated with the entry. The memory subsystem further includes a store queue controller that, responsive to receipt at the memory subsystem of a sequence of operations including a synchronizing operation and a particular store operation, places a target address and data of the particular store operation within the address field and data field, respectively, of an entry in the store queue and sets the virtual sync field of the entry to represent the synchronizing operation, such that a number of store queue entries utilized is reduced.
摘要:
A data processing system includes a memory subsystem and an execution unit, coupled to the memory subsystem, which executes store instructions to determine target memory addresses of store operations to be performed by the memory subsystem. The data processing system further includes a mode field having a first setting indicating strong ordering between store operations and a second setting indicating weak ordering between store operations. Store operations accessing the memory subsystem are associated with either the first setting or the second setting. The data processing system also includes logic that, based upon settings of the mode field, inserts a synchronizing operation between a store operation associated with the first setting and a store operation associated with the second setting, such that all store operations preceding the synchronizing operation complete before store operations subsequent to the synchronizing operation.
摘要:
A method and processor system that substantially eliminates data bus operations when completing updates of an entire cache line with a full store queue entry. The store queue within a processor chip is designed with a series of AND gates connecting individual bits of the byte enable bits of a corresponding entry. The AND output is fed to the STQ controller and signals when the entry is full. When full entries are selected for dispatch to the RC machines, the RC machine is signaled that the entry updates the entire cache line. The RC machine obtains write permission to the line, and then the RC machine overwrites the entire cache line. Because the entire cache line is overwritten, the data of the cache line is not retrieved when the request for the cache line misses at the cache or when data goes state before write permission is obtained by the RC machine.
摘要:
A method and processor system that substantially enhances the store gathering capabilities of a store queue entry to enable gathering of a maximum number of proximate-in-time store operations before the entry is selected for dispatch. A counter is provided for each entry to track a time since a last gather to the entry. When a new gather does not occur before the counter reaches a threshold saturation point, the entry is signaled ready for dispatch. By defining an optimum threshold saturation point before the counter expires, sufficient time is provided for the entry to gather a proximate-in-time store operation. The entry may be deemed eligible for selection when certain conditions occur, including the entry becoming full, issuance of a barrier operation, and saturation of the counter. The use of the counter increases the ability of a store queue entry to complete gathering of enough store operations to update an entire cache line before that entry is dispatched to an RC machine.
摘要:
A data processing system includes at least first and second coherency domains, each including at least one processor core and a memory. In response to an initialization operation by a processor core that indicates a target memory block to be initialized, a cache memory in the first coherency domain determines a coherency state of the target memory block with respect to the cache memory. In response to the determination, the cache memory selects a scope of broadcast of an initialization request identifying the target memory block. A narrower scope including the first coherency domain and excluding the second coherency domain is selected in response to a determination of a first coherency state, and a broader scope including the first coherency domain and the second coherency domain is selected in response to a determination of a second coherency state. The cache memory then broadcasts an initialization request with the selected scope. In response to the initialization request, the target memory block is initialized within a memory of the data processing system to an initialization value.
摘要:
A data processing system includes a first plane including a first plurality of processing nodes, each including multiple processing units, and a second plane including a second plurality of processing nodes, each including multiple processing units. The data processing system also includes a plurality of point-to-point first tier links. Each of the first plurality and second plurality of processing nodes includes one or more first tier links among the plurality of first tier links, where the first tier link(s) within each processing node connect a pair of processing units in the same processing node for communication. The data processing system further includes a plurality of point-to-point second tier links. At least a first of the plurality of second tier links connects processing units in different ones of the first plurality of processing nodes, at least a second of the plurality of second tier links connects processing units in different ones of the second plurality of processing nodes, and at least a third of the plurality of second tier links connects a processing unit in the first plane to a processing unit in the second plane.
摘要:
A processor communication register (PCR) contained within a multiprocessor cluster system provides enhanced processor communication. The PCR stores information that is useful in pipelined or parallel multi-processing. Each processor cluster has exclusive rights to store to a sector within the PCR and has continuous access to read its contents. Each processor cluster updates its exclusive sector within the PCR, instantly allowing all of the other processors within the cluster network to see the change within the PCR data, and bypassing the cache subsystem. Efficiency is enhanced within the processor cluster network by providing processor communications to be immediately networked and transferred into all processors without momentarily restricting access to the information or forcing all the processors to be continually contending for the same cache line, and thereby overwhelming the interconnect and memory system with an endless stream of load, store and invalidate commands.
摘要:
A method for informing a processor of a selected order of transmission of data to the processor. The method comprises the steps of coupling system components via a data bus to the processor to effectuate data transfer, determining at the system component logic the order in which to transmit data to the processor, and issuing to the data bus a selected order bit concurrent with the data, wherein the selected order bit alerts the processor of the order and the data is transmitted in that order. In a preferred embodiment, the system component is the cache and the method may involve receiving at the cache a preference of ordering for a read address/request from the processor. The preference order logic of the cache controller or a preference order logic component evaluates the preference of ordering desired by comparing the processor preference with other preferences, including cache order preference. One preference order is selected and the data is then retrieved from a cache line of the cache in the order selected.
摘要:
A system, method, and computer program product for semi-synchronously copying data from a first portion of memory to a second portion of memory are disclosed. The method comprises receiving, in a processor, a call for a semi-synchronous memory copy operation. The semi-synchronous memory copy operation preserves temporal persistence of validity for a virtual source address corresponding to a source location in a memory and a virtual target address corresponding to a target location in the memory by setting a flag bit. The call includes at least the virtual source address, the virtual target address, and an indicator identifying a number of bytes to be copied. The memory copy operation is placed in a queue for execution by a memory controller. The queue is coupled to the memory controller. At least one subsequent instruction is continued to be executed as the subsequent instruction becomes available from an instruction pipeline.
摘要:
A system, method, and a computer readable for protecting content of a memory page are disclosed. The method includes determining a start of a semi-synchronous memory copy operation. A range of addresses is determined where the semi-synchronous memory copy operation is being performed. An issued instruction that removes a page table entry is detected. The method further includes determining whether the issued instruction is destined to remove a page table entry associated with at least one address in the range of addresses. In response to the issued instruction being destined to remove the page table entry, the execution of the issued instruction is stalled until the semi-synchronous memory copy operation is completed.