摘要:
A system and method for latency-aware thread scheduling in non-uniform cache architecture are provided. Instructions may be provided to the hardware specifying in which banks to store data. Information as to which banks store which data may also be provided, for example, by the hardware. This information may be used to schedule threads on one or more cores. A selected bank in cache memory may be reserved strictly for selected data.
摘要:
A data processing system has a processor, a memory, and an instruction set architecture (ISA) that includes: (1) an asynchronous memory mover (AMM) store (ST) instruction initiates an asynchronous memory move operation that moves data from a first memory location having a first real address to a second memory location having a second real address by: (a) first performing a move of the data in virtual address space utilizing a source effective address a destination effective address; and (b) when the move is completed, completing a physical move of the data to the second memory location, independent of the processor. The ISA further provides (2) an AMM terminate ST instruction for stopping an ongoing AMM operation before completion of the AMM operation, and (3) a LD CMP instruction for checking a status of an AMM operation.
摘要:
A data processing system has an asynchronous memory mover, which includes multiple sets of registers for storing addressing and control parameters utilized to generate one or more asynchronous memory move (AMM) operations. The memory mover detects a receipt of a first set of parameters in a first set of registers from the processor. The processor forwards the parameters after the processor initiates a data move in virtual address space, utilizing a source effective address and a destination effective address. The memory mover responds to receiving the first set of parameters by generating and launching a first asynchronous memory move (AMM) operation. When the memory mover receives a second set of parameters in a second set of registers before the first AMM operation completes, the memory mover generates and launches a second AMM operation concurrently with the first AMM operation if no address conflicts exist.
摘要:
While an AMM operation is ongoing, a prefetch request for data from the source effective address or the destination effective address triggers a cache injection by the AMM mover (or memory controller) of relevant data from the stream of data being moved in the physical memory. The memory controller forwards the first prefetched line to the prefetch engine and L1 cache. The memory controller also forwards the next cache lines in the sequence of data to the L2 cache and a subsequent set of cache lines to the L3 cache. The memory controller then forwards the remaining data to the destination memory location. Quick access to prefetch data is enabled by buffering the stream of data in the upper caches rather than placing all the moved data within the memory. Also, the memory controller does not overrun the upper caches, by placing moved data into only a subset of the available cache lines of the upper level cache.
摘要:
In at least one embodiment, a processor detects during execution of program code whether a load instruction within the program code is associated with a hint. In response to detecting that the load instruction is not associated with a hint, the processor retrieves a full cache line of data from the memory hierarchy into the processor in response to the load instruction. In response to detecting that the load instruction is associated with a hint, a processor retrieves a partial cache line of data into the processor from the memory hierarchy in response to the load instruction.
摘要:
A method for reconfiguring a cache memory is provided. The method in one aspect may include analyzing one or more characteristics of an execution entity accessing a cache memory and reconfiguring the cache based on the one or more characteristics analyzed. Examples of analyzed characteristic may include but are not limited to data structure used by the execution entity, expected reference pattern of the execution entity, type of an execution entity, heat and power consumption of an execution entity, etc. Examples of cache attributes that may be reconfigured may include but are not limited to associativity of the cache memory, amount of the cache memory available to store data, coherence granularity of the cache memory, line size of the cache memory, etc.
摘要:
A microprocessor and system with improved performance and power in simultaneous multithreading (SMT) microprocessor architecture. The microprocessor and system includes a process wherein the processor has the ability to select instructions from one thread or another in any given processor clock cycle. Instructions from each, thread may be assigned selection priorities at multiple decision points in a processor in a given cycle dynamically. The thread priority is based on monitoring performance behavior and activities in the processor. In the exemplary embodiment, the present invention discloses a microprocessor and system for synchronizing thread priorities among multiple decision points throughout the micro-architecture of the microprocessor. This system and method for synchronizing thread priorities allows each thread priority to he in sync and aware of the status of other thread priorities at various decision points within the microprocessor.
摘要:
Each instruction thread in a SMT processor is associated with a software assigned base input processing priority. Unless some predefined event or circumstance occurs with an instruction being processed or to be processed, the base input processing priorities of the respective threads are used to determine the interleave frequency between the threads according to some instruction interleave rule. However, upon the occurrence of some predefined event or circumstance in the processor related to a particular instruction thread, the base input processing priority of one or more instruction threads is adjusted to produce one more adjusted priority values. The instruction interleave rule is then enforced according to the adjusted priority value or values together with any base input processing priority values that have not been subject to adjustment.
摘要:
The present invention proposes a novel cache residence prediction mechanism that predicts whether requested data of a cache miss can be found in another cache. The memory controller can use the prediction result to determine if it should immediately initiate a memory access, or initiate no memory access until a cache snoop response shows that the requested data cannot be supplied by a cache.The cache residence prediction mechanism can be implemented at the cache side, the memory side, or both. A cache-side prediction mechanism can predict that data requested by a cache miss can be found in another cache if the cache miss address matches an address tag of a cache line in the requesting cache and the cache line is in an invalid state. A memory-side prediction mechanism can make effective prediction based on observed memory and cache operations that are recorded in a prediction table.
摘要:
Methods and apparatus are provided for sharing storage and execution resources between architectural units in a microprocessor using a polymorphic function unit. A method for executing instructions in a processor having a polymorphic execution unit includes the steps of reloading a state associated with a first instruction class and reconfiguring the polymorphic execution unit to operate in accordance with the first instruction class, when an instruction of the first instruction class is encountered and the polymorphic execution unit is configured to operate in accordance with a second instruction class. The method also includes the steps of reloading a state associated with a second instruction class and reconfiguring the polymorphic execution unit to operate in accordance with the second instruction class, when an instruction of the second instruction class is encountered and the polymorphic execution unit is configured to operate in accordance with the first instruction class.