摘要:
Systems, apparatuses, and methods for a hardware and software system to automatically decompose a program into multiple parallel threads are described. For example, a method according to one embodiment comprises: analyzing a single-threaded region of executing program code, the analysis including identifying dependencies within the single-threaded region; determining portions of the single-threaded region of executing program code which may be executed in parallel based on the analysis; assigning the portions to two or more parallel execution tracks; and executing the portions in parallel across the assigned execution tracks.
摘要:
Systems, apparatuses, and methods for a hardware and software system to automatically decompose a program into multiple parallel threads are described. For example, a method according to one embodiment comprises: analyzing a single-threaded region of executing program code, the analysis including identifying dependencies within the single-threaded region; determining portions of the single-threaded region of executing program code which may be executed in parallel based on the analysis; assigning the portions to two or more parallel execution tracks; and executing the portions in parallel across the assigned execution tracks.
摘要:
Embodiments of computer-implemented methods, systems, computing devices, and computer-readable media (transitory and non-transitory) are described herein for analyzing execution of a plurality of executable instructions and, based on the analysis, providing an indication of a benefit to be obtained by vectorization of at least a subset of the plurality of executable instructions. In various embodiments, the analysis may include identification of the subset of the plurality of executable instructions suitable for conversion to one or more single-instruction multiple-data (“SIMD”) instructions.
摘要:
Systems, apparatuses, and methods for a hardware and software system to automatically decompose a program into multiple parallel threads are described. In some embodiments, the systems and apparatuses execute a method of original code decomposition and/or generated thread execution.
摘要:
Systems, apparatuses, and methods for a hardware and software system to automatically decompose a program into multiple parallel threads are described. In some embodiments, the systems and apparatuses execute a method of original code decomposition and/or generated thread execution.
摘要:
A micro-architecture may provide a hardware and software co-designed dynamic binary translation. The micro-architecture may invoke a method to perform a dynamic binary translation. The method may comprise executing original software code compiled targeting a first instruction set, using processor hardware to detect a hot spot in the software code and passing control to a binary translation translator, determining a hot spot region for translation, generating the translated code using a second instruction set, placing the translated code in a translation cache, executing the translated code from the translated cache, and transitioning back to the original software code after the translated code finishes execution.
摘要:
A technique for using memory attributes to relay information to a program or other agent. More particularly, embodiments of the invention relate to using memory attribute bits to check various memory properties in an efficient manner.
摘要:
Example methods and apparatus to manage object locks are disclosed. A disclosed example method includes receiving an object lock request from a processor, the lock request associated with object lock code to lock an object, and generating object lock-bypass code based on a type of the processor, the object lock-bypass code to execute in a managed runtime in response to receiving the object lock request. The example method also includes identifying a type of instruction set architecture (ISA) associated with the processor, invoking a checkpoint instruction for the processor based on the identified ISA, suspending the object lock code from executing and executing target code when the object is uncontended, and allowing the object lock code to execute when the object is contended.
摘要:
In one embodiment, an object oriented programming language can pre-fetch objects and fields within those objects to a cache memory. A hardware performance monitor can be used to identify loads that read from an address that is frequently absent from a memory. Instrumentation can be used to mark the objects that include the frequently missed address. A compiler can identify chains of objects that are frequently absent from memory. The chains of objects can be pre-fetched without regard to the types of object. Other embodiments are described and claimed.
摘要:
In one embodiment, a processor can operate in multiple modes, including a direct execution mode and an emulation execution mode. More specifically, the processor may operate in a partial emulation model in which source instruction set architecture (ISA) instructions are directly handled in the direct execution mode and translated code generated by an emulation engine is handled in the emulation execution mode. Embodiments may also provide for efficient transitions between the modes using information that can be stored in one or more storages of the processor and elsewhere in a system. Other embodiments are described and claimed.