摘要:
A technique to improve the performance of virtualized input/output (I/O) resources of a microprocessor within a virtual machine environment. More specifically, embodiments of the invention enable accesses of virtualized I/O resources to be made by guest software without necessarily invoking host software. Furthermore, embodiments of the invention enable more efficient delivery of interrupts to guest software by alleviating the need for host software to be invoked in the delivery process.
摘要:
A method for application managed CPU context switching. The method includes determining whether state data of a CPU is valid for a process. The determining is performed by the process itself. If the state data of the CPU is not valid for the process, the process accesses functional hardware of the CPU to load new state data into the CPU. The process then continues to execute on the CPU using the new state data. If a context switch occurs, the existing state data of the CPU is invalidated. The state data of the CPU can be invalidated by an operating system without storing the state data in main memory.
摘要:
The present invention provides for parallel subword instructions that cause results to be non-contiguously stored in a result register. For example, a targeting-type instruction can specify (implicitly or explicitly) a bit position and the result of each of the parallel subword compare operations can be stored at that bit position within the respective subword location of a result register. Alternatively, for a shifting-type instruction, pre-existing contents of a result register can be shifted one bit toward greater significance while the results are of the present operation are stored in the least-significant bits of respective result-register subword locations. This approach provides the results of multiple parallel subword compare instructions to be combined with relatively few instructions and reduces the maximum lateral movement of information—both of which can enhance performance.
摘要:
Virtual-machine-monitor operation and implementation is facilitated by number of easily implemented features and extensions added to the features of a processor architecture. These features, one or more of which are used in various embodiments of the present invention, include a vmsw instruction that provides a means for transitioning between virtualization mode and non-virtualization mode without an interruption, a virtualization fault that faults on an attempt by a priority-0 routine in virtualization mode attempting to execute a privileged instruction, and a flexible highest-implemented-address mechanism to partition virtual address space into a virtualization address space and a non-virtualization address space.
摘要:
The present invention provides a system with a cache that indicates which, if any, of its sections contain data having spent status. The invention also provides a method for identifying cache sections containing data having spent status and then purging without writing back to main memory a cache line having at least one section containing data having spent status. The invention further provides a program that specifies a cache-line section containing data that is to acquire “spent” status. “Spent” data, herein, is useless modified or unmodified data that was formerly at least potentially useful data when it was written to a cache. “Purging” encompasses both invalidating and overwriting.
摘要:
A method of efficient code generation for modulo scheduled uncounted loops includes: assigning a given stage predicate to each instruction in each stage, including assigning a given stage predicate to each instruction in each speculative stage; and using the stage predicate to conditionally enable or disable the execution of an instruction during the prologue and epilogue execution.
摘要:
A method and system for determining, at run-time, whether or not to defer an exception that arises during execution of a control-speculative load instruction based on a recent history of execution of that control-speculative load instruction. The method and system relies on recent execution history stored in a speculative-load-accelerated-deferral table. If an exception arises during execution of a control-speculative load instruction, then the speculative-load-accelerated-deferral table is searched for an entry corresponding to the control-speculative load instruction. If an entry is found, then the exception is deferred, since the speculative-load-accelerated-deferral table indicates that a recent exception arising from execution of the control-speculative load instruction was not recovered via a chk.s-mediated branch to a recovery block, and not otherwise used by a non-speculative instruction. By contrast, if no entry corresponding to the control-speculative load instruction is found in the speculative-load-accelerated-deferral table, then the exception is immediately handled.
摘要:
Virtual-machine-monitor operation and implementation is facilitated by number of easily implemented features and extensions added to the features of a processor architecture. These features, one or more of which are used in various embodiments of the present invention, include a vmsw instruction that provides a means for transitioning between virtualization mode and non-virtualization mode without an interruption, a virtualization fault that faults on an attempt by a priority-0 routine in virtualization mode attempting to execute a privileged instruction, and a flexible highest-implemented-address mechanism to partition virtual address space into a virtualization address space and a non-virtualization address space.
摘要:
A branch operation is processed using a branch predict instruction and an associated branch instruction. The branch predict instruction indicates a predicted direction, a target address, and an instruction address for the associated branch instruction. When the branch predict instruction is detected, the target address is stored at an entry indicated by the associated branch instruction address and a prefetch request is triggered to the target address. The branch predict instruction may also include hint information for managing the storage and use of the branch prediction information.
摘要:
The present invention provides for parallel subword instructions that cause results to be non-contiguously stored in a result register. For example, a targeting-type instruction can specify (implicitly or explicitly) a bit position and the result of each of the parallel subword compare operations can be stored at that bit position within the respective subword location of a result register. Alternatively, for a shifting-type instruction, pre-existing contents of a result register can be shifted one bit toward greater significance while the results are of the present operation are stored in the least-significant bits of respective result-register subword locations. This approach provides the results of multiple parallel subword compare instructions to be combined with relatively few instructions and reduces the maximum lateral movement of information—both of which can enhance performance.