Abstract:
In one embodiment, a processor includes a front end unit to fetch and decode an instruction. The front end unit includes a first random number generator to generate a random value responsive to a profileable event associated with the instruction. The processor further includes a profile logic to collect profile information associated with the instruction responsive to a sample signal, where the sample signal is based on at least a portion of the random value. Other embodiments are described and claimed.
Abstract:
A method and system to provide user-level multithreading are disclosed. The method according to the present techniques comprises receiving programming instructions to execute one or more shared resource threads (shreds) via an instruction set architecture (ISA). One or more instruction pointers are configured via the ISA; and the one or more shreds are executed simultaneously with a microprocessor, wherein the microprocessor includes multiple instruction sequencers.
Abstract:
In one embodiment, the present invention includes a memory management unit (MMU) having entries to store virtual address to physical address translations, where each entry includes a location indicator to indicate whether a memory location for the corresponding entry is present in a local or remote memory. In this way, a common virtual memory space can be shared between the two memories, which may be separated by one or more non-coherent links. Other embodiments are described and claimed.
Abstract:
A method and system to provide user-level multithreading are disclosed. The method according to the present techniques comprises receiving programming instructions to execute one or more shared resource threads (shreds) via an instruction set architecture (ISA). One or more instruction pointers are configured via the ISA; and the one or more shreds are executed simultaneously with a microprocessor, wherein the microprocessor includes multiple instruction sequencers.
Abstract:
Presented are embodiments of methods and systems for library-based compilation and dispatch to automatically spread computations of a program across heterogeneous cores in a processing system. The source program contains a parallel-programming keyword, such as mapreduce, from a high-level, library-oriented parallel programming language. The compiler inserts one or more calls for a generic function, associated with the parallel-programming keyword, into the compiled code. A runtime library provides a predicate-based library system that includes multiple hardware specific implementations (“variants”) of the generic function. A runtime dispatch engine dynamically selects the best-available (e.g., most specific) variant, from a bundle of hardware-specific variants, for a given input and machine configuration. That is, the dispatch engine may take into account run-time availability of processing elements, choose one of them, and then select for dispatch an appropriate variant to be executed on the selected processing element. Other embodiments are also described and claimed.
Abstract:
A method and system to provide user-level multithreading are disclosed. The method according to the present techniques comprises receiving programming instructions to execute one or more shared resource threads (shreds) via an instruction set architecture (ISA). One or more instruction pointers are configured via the ISA; and the one or more shreds are executed simultaneously with a microprocessor, wherein the microprocessor includes multiple instruction sequencers.
Abstract:
Methods and apparatus relating to microcode refactoring and/or caching are described. In some embodiments, an off-chip structure that stores microcode is shared by multiple processor cores. Other embodiments are also described and claimed.
Abstract:
Method, apparatus, and program means for a programmable event driven yield mechanism that may activate other threads. In one embodiment, an apparatus includes execution resources to execute a plurality of instructions and an event detector to detect a long latency event associated with a synchronization object. The event detector can cause a first thread switch in response to the long latency event associated with the synchronization object. The apparatus may also include a spin detector to detect that the synchronization object is a contended synchronization object. The spin detector can cause a second thread switch in response to the detection of the contended synchronization object to enable a spin detect response.
Abstract:
Methods and apparatus relating to microcode refactoring and/or caching are described. In some embodiments, an off-chip structure that stores microcode is shared by multiple processor cores. Other embodiments are also described and claimed.
Abstract:
In one embodiment, the present invention includes a memory management unit (MMU) having entries to store virtual address to physical address translations, where each entry includes a location indicator to indicate whether a memory location for the corresponding entry is present in a local or remote memory. In this way, a common virtual memory space can be shared between the two memories, which may be separated by one or more non-coherent links. Other embodiments are described and claimed.