摘要:
A processor includes at least one execution unit, an instruction sequencing unit coupled to the execution unit, and a plurality of caches at a same level. The caches, which store data utilized by the execution unit, each store only data having associated addresses within a respective one of a plurality of subsets of an address space and implement diverse caching behaviors. The diverse caching behaviors can include differing memory update policies, differing coherence protocols, differing prefetch behaviors, and differing cache line replacement policies.
摘要:
Combined response logic for a bus receives a combined data access and cast out/deallocate operation initiating by a storage device within a specific level of a storage hierarchy, with a coherency state and LRU position of the cast out/deallocate victim appended. Snoopers on the bus drive snoop responses to the combined operation with the coherency state and/or LRU position of locally-stored cache lines corresponding to the victim appended. The combined response logic determines, from the coherency state and LRU position information appended to the combined operation and the snoop responses, whether an update of the LRU position and/or coherency state of a cache line corresponding to the victim within one of the snoopers is required. If so, the combined response logic selects a snooper storage device to have at least the LRU position of a respective cache line corresponding to the victim updated, and appends an update command identifying the selected snooper to the combined response. The snooper selected to be updated may be randomly chosen, selected based on LRU position of the cache line corresponding to the victim within respective storage, or selected based on other criteria.
摘要:
In cancelling the cast out portion of a combined operation including a data access related to the cast out, the combined response logic explicitly directs the storage device initiating the combined operation not to allocate storage for the target of the data access. Instead, the target of the data access may be passed directly to an in-line processor core without storage, may be stored in a horizontal storage device, or may be stored in an in-line, noninclusive, lower level storage device. Cancellation of the cast out thus defers any latency associated with writing the cast out victim to system memory while maximizing utilization of available storage with acceptable tradeoffs in data access latency.
摘要:
A data processing system includes a processor, a system memory, one or more input/output channel controllers (IOCC), and a system bus connecting the processor, the memory and the IOCCs together for communicating instructions, address and data between the various elements of a system. The IOCC includes a paged cache storage having a number of lines wherein each line of the page may be, for example, 32 bytes. Each page in the cache also has several attribute bits for that page including the so called WIM and attribute bits. The W bit is for controlling write through operations; the I bit controls cache inhibit; and the M bit controls memory coherency. Since the IOCC is unaware of these page table attribute bits for the cache lines being DMAed to system memory, IOCC must maintain memory consistency and cache coherency without sacrificing performance. For DMA write data to system memory, new cache attributes called global, cachable and demand based write through are created. Individual writes within a cache line are gathered by the IOCC and only written to system memory when the I/O bus master accesses a different cache line or relinquishes the I/O bus.
摘要:
A processor includes execution resources, data storage, and an instruction sequencing unit, coupled to the data storage and the execution resources, that supplies instructions within the data storage to the execution resources. The execution resources include a plurality of load-store units that each process only instructions that access data having associated addresses within a respective one of a plurality of subsets of an address space. The load-store units can have diverse hardware such that a maximum number of instructions that can be concurrently executed is different for different load-store units or such that some of the load-store units are restricted to executing certain classes of instructions.
摘要:
In cancelling the cast out portion of a combined operation including a data access related to the cast out, the combined response logic explicitly directs a horizontal storage device at the same level as the storage device initiating the combined operation to allocate and store either the cast out or target data. A horizontal storage device having available space—i.e., an invalid or modified data element in a congruence class for the victim—stores either the target or the cast out data for subsequent access by an intervention. Cancellation of the cast out thus defers any latency associated with writing the cast out victim to system memory while maximizing utilization of available storage with acceptable tradeoffs in data access latency.
摘要:
A processor includes at least one execution unit, an instruction sequencing unit coupled to the execution unit, and a plurality of caches at a same level. The caches, which store data utilized by the execution unit, have diverse cache hardware and each preferably store only data having associated addresses within a respective one of a plurality of subsets of an address space. The diverse cache hardware can include, for example, differing cache sizes, differing associativities, differing sectoring, and differing inclusivities.
摘要:
A processor having a hashed and partitioned register file includes at least one execution unit, an instruction sequencing unit coupled to the execution unit, and a plurality of registers coupled to the execution unit. The plurality of registers are partitioned into a plurality of groups, such that registers within each group can store only data having associated addresses within a respective one of a plurality of subsets of an address space.
摘要:
An effectively “conditional”, cast out operation or cast out portion of a combined operation including a related data access may be cancelled by the combined response to the operation. The combined response logic receives coherency state and/or LRU position information for cache lines corresponding to the cast out victim within snoopers and vertically in-line storage. The combined response logic may also receive information regarding the presence of shared or invalid cache lines in snoopers or lower level storage within the congruence class for the victim, or information regarding the read-once nature of the data access target. Based on these responses, the combined response logic determines whether the cast out should be cancelled and, if so, selects and drives the appropriate combined response code.
摘要:
Upon snooping a combined data access and cast out/deallocate operation initiating by a horizontal storage device, snoop logic determines, from LRU position information appended to the combined response to the combined operation, whether the coherency state and/or LRU position of the victim may be upgraded within the subject storage device. If so, the coherency state or LRU position is upgraded to improve global data storage management. For instance, a cache line within a snooping storage device may be altered to assume the coherency state of the victim within the storage device initiating the combined operation to improve data storage management under a given replacement policy.