摘要:
A data processing system includes at least first and second nodes and a segmented interconnect having coupled first and second segments. The first node includes the first segment and first and second agents coupled to the first segment, and the second node includes the second segment and a third agent coupled to the second segment. The first node further includes cancellation logic that, in response to the first agent issuing a request on the segmented interconnect that propagates from the first segment to the second segment and the second agent indicating ability to service the request, sends a cancellation message to the third agent instructing the third agent to ignore the request.
摘要:
A processor includes at least one execution unit, an instruction sequencing unit coupled to the execution unit, and a plurality of caches at a same level. The caches, which store data utilized by the execution unit, have diverse cache hardware and each preferably store only data having associated addresses within a respective one of a plurality of subsets of an address space. The diverse cache hardware can include, for example, differing cache sizes, differing associativities, differing sectoring, and differing inclusivities.
摘要:
A data processing system includes a plurality of nodes, which each contain at least one agent and each have an associated node identifier, and memory distributed among the plurality of nodes. The data processing system further includes an interconnect containing a segmented data channel, where each node contains a segment of the segmented data channel and each segment is coupled to at least one other segment by destination logic. In response to snooping a write request of a master agent on the interconnect, a target agent that will service the write request places its node identifier in a snoop response. When the master agent receives the combined response, which contains the node identifier of the target agent, the master agent issues on the segmented data channel a write data transaction specifying the node identifier of the target agent as a destination identifier. In response to receipt of the write data transaction, the destination logic transmits the write data transaction to a next segment only if the destination identifier does not match a node identifier associated with a node containing a current segment.
摘要:
A processor having a hashed and partitioned storage subsystem includes at least one execution unit, an instruction sequencing unit coupled to the execution unit, and a cache subsystem including a plurality of caches that store data utilized by the execution unit. Each cache among the plurality of caches stores only data having associated addresses within a respective one of a plurality of subsets of an address space. In one preferred embodiment, the execution units of the processor include a number of load-store units (LSUs) that each process only instructions that access data having associated addresses within a respective one of the plurality of address subsets. The processor may further be incorporated within a data processing system having a number of interconnects and a number of sets of system memory hardware that each have affinity to a respective one of the plurality of address subsets.
摘要:
A processor having a hashed and partitioned register file includes at least one execution unit, an instruction sequencing unit coupled to the execution unit, and a plurality of registers coupled to the execution unit. The plurality of registers are partitioned into a plurality of groups, such that registers within each group can store only data having associated addresses within a respective one of a plurality of subsets of an address space.
摘要:
A data processing system includes an interconnect, a plurality of nodes coupled to the interconnect that each include at least one agent, response logic within each node, and a queue. In response to snooping a transaction on the interconnect, each agent outputs a snoop response. In addition, the queue, which has an associated agent, allocates an entry to service the transaction. The response logic within each node accumulates a partial combined response of its node and any preceding node until a complete combined response for all of the plurality of nodes is obtained. However, prior to the associated agent receiving the complete combined response, the queue speculatively deallocates the entry if the partial combined response indicates that an agent other than the associated agent will service the transaction.
摘要:
A processor includes execution resources, data storage, and an instruction sequencing unit, coupled to the execution resources and the data storage, that supplies instructions within the data storage to the execution resources. At least one of the execution resources, the data storage, and the instruction sequencing unit is implemented with a plurality of hardware partitions of like function for processing data. The data processed by each hardware partition is assigned according to a selectable hash of addresses associated as with the data. In a preferred embodiment, the selectable hash can be altered dynamically during the operation of the processor, for example, in response to detection of an error or a load imbalance between the hardware partitions.
摘要:
A method of maintaining coherency in a cache hierarchy of a processing unit of a computer system, wherein the upper level (L1) cache includes a split instruction/data cache. In one implementation, the L1 data cache is store-through, and each processing unit has a lower level (L2) cache. When the lower level cache receives a cache operation requiring invalidation of a program instruction in the L1 instruction cache (i.e., a store operation or a snooped kill), the L2 cache sends an invalidation transaction (e.g., icbi) to the instruction cache. The L2 cache is fully inclusive of both instructions and data. In another implementation, the L1 data cache is write-back, and a store address queue in the processor core is used to continually propagate pipelined address sequences to the lower levels of the memory hierarchy, i.e., to an L2 cache or, if there is no L2 cache, then to the system bus. If there is no L2 cache, then the cache operations may be snooped directly against the L1 instruction cache.
摘要:
A multiprocessor data processing system requires careful management to maintain cache coherency. Conventional systems using a MESI approach sacrifice some performance with inefficient lock-acquisition and lock-retention techniques. The disclosed system provides additional cache states, indicator bits, and lock-acquisition routines to improve cache performance. The additional cache states allow cache state transition sequences to be optimized. In particular, the claimed system and method provides that a given processor, after acquiring a lock or reservation to a given cache line, will keep the lock, to make successive modifications to the cache line, instead of releasing it to other processors after making only one modification. By doing so, the overhead typically required to acquire a lock before making any cache line modification is eliminated for successive modifications.
摘要:
A method of maintaining coherency in a multiprocessor computer system wherein each processing unit's cache has sectored cache lines. A first cache coherency state is assigned to one of the sectors of a particular cache line, and a second cache coherency state, different from the first cache coherency state, is assigned to the overall cache line while maintaining the first cache coherency state for the first sector. The first cache coherency state may provide an indication that the first sector contains a valid value which is not shared with any other cache (i.e., an exclusive or modified state), and the second cache coherency state may provide an indication that at least one of the sectors in the cache line contains a valid value which is shared with at least one other cache (a shared, recently-read, or tagged state). Other coherency states may be applied to other sectors in the same cache line. Partial intervention may be achieved by issuing a request to retrieve an entire cache line, and sourcing only a first sector of the cache line in response to the request. A second sector of the same cache line may be sourced from a third cache. Other sectors may also be sourced from a system memory device of the computer system as well. Appropriate system bus codes are utilized to transmit cache operations to the system bus and indicate which sectors of the cache line are targets of the cache operation.