摘要:
Method, apparatus, and program means for a programmable event driven yield mechanism that may activate other threads. In one embodiment, an apparatus includes execution resources to execute a plurality of instructions and an event detector to detect a long latency event associated with a synchronization object. The event detector can cause a first thread switch in response to the long latency event associated with the synchronization object. The apparatus may also include a spin detector to detect that the synchronization object is a contended synchronization object. The spin detector can cause a second thread switch in response to the detection of the contended synchronization object to enable a spin detect response.
摘要:
Method, apparatus, and program means for a programmable event driven yield mechanism that may activate other threads. In one embodiment, an apparatus includes execution resources to execute a plurality of instructions and an event detector to detect a long latency event associated with a synchronization object. The event detector can cause a first thread switch in response to the long latency event associated with the synchronization object. The apparatus may also include a spin detector to detect that the synchronization object is a contended synchronization object. The spin detector can cause a second thread switch in response to the detection of the contended synchronization object to enable a spin detect response.
摘要:
A technique for using memory attributes to relay information to a program or other agent. More particularly, embodiments of the invention relate to using memory attribute bits to check various memory properties in an efficient manner.
摘要:
In one embodiment, the present invention includes a method for directly communicating between an accelerator and an instruction sequencer coupled thereto, where the accelerator is a heterogeneous resource with respect to the instruction sequencer. An interface may be used to provide the communication between these resources. Via such a communication mechanism a user-level application may directly communicate with the accelerator without operating system support. Further, the instruction sequencer and the accelerator may perform operations in parallel. Other embodiments are described and claimed.
摘要:
In one embodiment, the present invention includes a method for directly communicating between an accelerator and an instruction sequencer coupled thereto, where the accelerator is a heterogeneous resource with respect to the instruction sequencer. An interface may be used to provide the communication between these resources. Via such a communication mechanism a user-level application may directly communicate with the accelerator without operating system support. Further, the instruction sequencer and the accelerator may perform operations in parallel. Other embodiments are described and claimed.
摘要:
A technique for using memory attributes to relay information to a program or other agent. More particularly, embodiments of the invention relate to using memory attribute bits to check various memory properties in an efficient manner.
摘要:
A method and apparatus for changing the configuration of a multi-core processor is disclosed. In one embodiment, a throttle module (or throttle logic) may determine the amount of parallelism present in the currently-executing program, and change the execution of the threads of that program on the various cores. If the amount of parallelism is high, then the processor may be configured to run a larger amount of threads on cores configured to consume less power. If the amount of parallelism is low, then the processor may be configured to run a smaller amount of threads on cores configured for greater scalar performance.
摘要:
Disclosed are embodiments of a system, methods and mechanism for using idle thread units to perform acceleration threads that are transparent to the operating system. When the operating system scheduler has no work to schedule on the idle thread units, the operating system may issue a halt or monitor/mwait or other instruction to place the thread unit into an idle state. While the thread unit is idle, from the operating system perspective, the thread unit may be utilized to perform speculative acceleration threads in order to accelerate threads running on non- idle thread units. The context of the idle thread unit is saved prior to execution of the acceleration thread and is restored when the operating system requires use of the thread unit. The acceleration threads are transparent to the operating system. Other embodiments are also described and claimed.
摘要:
Disclosed are embodiments of a system, methods and mechanism for using idle thread units to perform acceleration threads that are transparent to the operating system. When the operating system scheduler has no work to schedule on the idle thread units, the operating system may issue a halt or monitor/mwait or other instruction to place the thread unit into an idle state. While the thread unit is idle, from the operating system perspective, the thread unit may be utilized to perform speculative acceleration threads in order to accelerate threads running on non-idle thread units. The context of the idle thread unit is saved prior to execution of the acceleration thread and is restored when the operating system requires use of the thread unit. The acceleration threads are transparent to the operating system. Other embodiments are also described and claimed.
摘要:
A computer system may support one or more techniques to allow dynamic pinning of the memory pages accessed by a non-CPU device, such as a graphics processing unit (GPU). The non-CPU may support virtual to physical address mapping and may thus be aware of the memory pages, which may not be pinned but may be accessed by the non-CPU. The non-CPU may notify or send such information to a run-time component such as a device driver associated with the CPU. The device driver may, dynamically, perform pinning of such memory pages, which may be accessed by the non-CPU. The device driver may even unpin the memory pages, which may be no longer accessed by the non-CPU. Such an approach may allow the memory pages, which may be no longer accessed by the non-CPU to be available for allocation to the other CPUs and/or non-CPUs.