Abstract:
A method and apparatus that generates a simplified, localized version (“a local stall”) of a global stall to improve the performance of a pipelined microprocessor. The local stall is generated when a data-dependency hazard is detected for a local consumer. Utilizing circuitry used in the pipelined microprocessor's data-forwarding circuitry, the local stall is generated with a relatively minor increase in circuitry. The local stall is generated much sooner than the global stall, arriving much sooner in a local pipeline. The local pipeline utilizes the local stall to override the global stall, when appropriate, and to ensure that correct data is read for a local consumer and to operate more efficiently than a standard pipeline without a local stall.
Abstract:
A processing unit of the invention has multiple instruction pipelines for processing multi-threaded instructions. Each thread may have an urgency associated with its program instructions. The processing unit has a thread switch controller to monitor processing of instructions through the various pipelines. The thread controller also controls switch events to move from one thread to another within the pipelines. The controller may modify the urgency of any thread such as by issuing an additional instruction. The thread controller preferably utilizes certain heuristics in making switch event decisions. A time slice expiration unit may also monitor expiration of threads for a given time slice.
Abstract:
In a computer system including a plurality of resources, a device receives a request from a software program to access a specified one of the plurality of resources, determines whether the specified one of the plurality of resources is a protected resource. If the specified one of the plurality of resources is a protected resource, the device denies the request if the computer system is operating in a protected mode of operation, and processes the request based on access rights associated with the software program if the computer system is not operating in the protected mode of operation.
Abstract:
Virtual-machine-monitor operation and implementation is facilitated by number of easily implemented features and extensions added to the features of a processor architecture. These features, one or more of which are used in various embodiments of the present invention, include a vmsw instruction that provides a means for transitioning between virtualization mode and non-virtualization mode without an interruption, a virtualization fault that faults on an attempt by a priority-0 routine in virtualization mode attempting to execute a privileged instruction, and a flexible highest-implemented-address mechanism to partition virtual address space into a virtualization address space and a non-virtualization address space.
Abstract:
In accordance with embodiments disclosed herein, there are provided methods, systems, mechanisms, techniques, and apparatuses for implementing a balanced P-LRU tree for a “multiple of 3” number of ways cache. For example, in one embodiment, such means may include an integrated circuit having a cache and a plurality of ways. In such an embodiment the plurality of ways include a quantity that is a multiple of three and not a power of two, and further in which the plurality of ways are organized into a plurality of pairs. In such an embodiment, means further include a single bit for each of the plurality of pairs, in which each single bit is to operate as an intermediate level decision node representing the associated pair of ways and a root level decision node having exactly two individual bits to point to one of the single bits to operate as the intermediate level decision nodes representing an associated pair of ways. In this exemplary embodiment, the total number of bits is N−1, wherein N is the total number of ways in the plurality of ways. Alternative structures are also presented for full LRU implementation, a “multiple of 5” number of cache ways, and variations of the “multiple of 3” number of cache ways.
Abstract:
In an embodiment, a page miss handler includes paging caches and a first walker to receive a first linear address portion and to obtain a corresponding portion of a physical address from a paging structure, a second walker to operate concurrently with the first walker, and a logic to prevent the first walker from storing the obtained physical address portion in a paging cache responsive to the first linear address portion matching a corresponding linear address portion of a concurrent paging structure access by the second walker. Other embodiments are described and claimed.
Abstract:
Methods and apparatuses for reducing step loads of processors are disclosed. Method embodiments comprise examining a number of instructions to be processed by a processor to determine the types of instructions that it has, calculating power consumption by in an execution period based on the types of instructions, and limiting the execution to a subset of instructions of the number to control the quantity of power for the execution period. Some embodiments may also create artificial activity to provide a minimum power floor for the processor. Apparatus embodiments comprise instruction type determination logic to determine types of instructions in an incoming instruction stream, a power calculator to calculate power consumption associated with processing a number of instructions in an execution period, and instruction throttling logic to control the power consumption by limiting the number of instructions to be processed in the execution period.
Abstract:
Methods and apparatuses for reducing step loads of processors are disclosed. Method embodiments comprise examining a number of instructions to be processed by a processor to determine the types of instructions that it has, calculating power consumption by in an execution period based on the types of instructions, and limiting the execution to a subset of instructions of the number to control the quantity of power for the execution period. Some embodiments may also create artificial activity to provide a minimum power floor for the processor. Apparatus embodiments comprise instruction type determination logic to determine types of instructions in an incoming instruction stream, a power calculator to calculate power consumption associated with processing a number of instructions in an execution period, and instruction throttling logic to control the power consumption by limiting the number of instructions to be processed in the execution period.
Abstract:
Methods and apparatuses for reducing step loads of processors are disclosed. Method embodiments comprise examining a number of instructions to be processed by a processor to determine the types of instructions that it has, calculating power consumption by in an execution period based on the types of instructions, and limiting the execution to a subset of instructions of the number to control the quantity of power for the execution period. Some embodiments may also create artificial activity to provide a minimum power floor for the processor. Apparatus embodiments comprise instruction type determination logic to determine types of instructions in an incoming instruction stream, a power calculator to calculate power consumption associated with processing a number of instructions in an execution period, and instruction throttling logic to control the power consumption by limiting the number of instructions to be processed in the execution period.
Abstract:
According to one embodiment, a processor includes an execution pipeline for executing a plurality of threads, including a first thread and a second thread. The processor further includes a multi-thread controller (MTC) coupled to the execution pipeline to determine whether to switch threads between the first and second thread based on a thread switch policy that is selected from a list of thread switch policies based on unfairness levels of the first and second thread, and in response to determining to switch threads, to switch from executing the first thread to executing the second thread.