摘要:
In one embodiment, the present invention includes a method for communicating a request for handling of a fault or exception occurring in an accelerator to a first instruction sequencer coupled thereto. The accelerator may be a heterogeneous resource with respect to the first instruction sequencer, e.g., of a different instruction set architecture. Responsive to the request, the fault or exception may be handled in the first instruction sequencer. Other embodiments are described and claimed.
摘要:
In one embodiment, the present invention includes a method for communicating a request for handling of a fault or exception occurring in an accelerator to a first instruction sequencer coupled thereto. The accelerator may be a heterogeneous resource with respect to the first instruction sequencer, e.g., of a different instruction set architecture. Responsive to the request, the fault or exception may be handled in the first instruction sequencer. Other embodiments are described and claimed.
摘要:
In one embodiment, the invention provides a method comprising determining metadata encoded in instructions of a stored program; and executing the stored program based on the metadata.
摘要:
In one embodiment, the invention provides a method comprising determining metadata encoded in instructions of a stored program; and executing the stored program based on the metadata.
摘要:
Embodiments of the present invention provide a method, apparatus and system which may include splitting a dependency chain into a set of reduced-width dependency chains; mapping one or more dependency chains onto one or more clustered dependency chain processors, wherein an issue-width of one or more of the clusters is adapted to accommodate a size of the dependency chains; and/or processing in parallel a plurality of dependency chains of a trace. Other embodiments are described and claimed.
摘要:
A method and system to provide user-level multithreading are disclosed. The method according to the present techniques comprises receiving programming instructions to execute one or more shared resource threads (shreds) via an instruction set architecture (ISA). One or more instruction pointers are configured via the ISA; and the one or more shreds are executed simultaneously with a microprocessor, wherein the microprocessor includes multiple instruction sequencers.
摘要:
Methods and apparatuses for thread management for multi-threading are described herein. In one embodiment, exemplary process includes selecting, during a compilation of code having one or more threads executable in a data processing system, a current thread having a most bottom order, determining resources allocated to one or more child threads spawned from the current thread, and allocating resources for the current thread in consideration of the resources allocated to the current thread's one or more child threads to avoid resource conflicts between the current thread and its one or more child threads. Other methods and apparatuses are also described.
摘要:
Provided are a method, system, and program for parallelizing source code with a compiler. Source code including source code statements is received. The source code statements are processed to determine a dependency of the statements. Multiple groups of statements are determined from the determined dependency of the statements, wherein statements in one group are dependent on one another. At least one directive is inserted in the source code, wherein each directive is associated with one group of statements. Resulting threaded code is generated including the inserted at least one directive. The group of statements to which the directive in the resulting threaded code applies are processed as a separate task. Each group of statements designated by the directive to be processed as a separate task may be processed concurrently with respect to other groups of statements.
摘要:
Instruction-level parallelism in software pipelined loops is exploited by predicting future register rotations. A processor includes an architected current frame marker register and at least one unarchitected frame marker register. Register rotation prediction is achieved by setting the register rotation of future iterations of a software loop to be a function of the unarchitected frame marker registers. True data dependencies remain, but the dependencies caused solely by register renaming are removed. Dynamic predication is used to predicate instructions from future iterations, allowing them to be squashed if dependencies are later found. The register renaming that results from the prediction can be included in instructions in a buffer, or a renaming stage in an execution pipeline can perform the renaming.
摘要:
Thread-data affinity optimization can be performed by a compiler during the compiling of a computer program to be executed on a cache coherent non-uniform memory access (cc-NUMA) platform. In one embodiment, the present invention includes receiving a program to be compiled. The received program is then compiled in a first pass and executed. During execution, the compiler collects profiling data using a profiling tool. Then, in a second pass, the compiler performs thread-data affinity optimization on the program using the collected profiling data.