Abstract:
A method and system to provide user-level multithreading are disclosed. The method according to the present techniques comprises receiving programming instructions to execute one or more shared resource threads (shreds) via an instruction set architecture (ISA). One or more instruction pointers are configured via the ISA; and the one or more shreds are executed simultaneously with a microprocessor, wherein the microprocessor includes multiple instruction sequencers.
Abstract:
Embodiments of systems, apparatuses, and methods for performing in a computer processor generation of a predicate mask based on vector comparison in response to a single instruction are described.
Abstract:
A method to provide effective control and data flow information in an Intermediate Representation (IR) form. A Path Sensitive single Assignment (PSA) IR form with effective and explicit control and data path information supports control flow sensitive optimizations such as path sensitive symbolic substitution, array privatization and speculative multi threading. In the definition of PSA form, besides defining new versioned variables, the gamma functions keep control path information. The gamma function in PSA form keeps the basic attribute of SSA IR form and only one definition exists for each use. Therefore, all existing Single Static Assignment (SSA) IR form based analysis can be applied in PSA form. The gamma function in PSA form keeps all essential control flow information and eliminates unnecessary predicates at the same time.
Abstract:
A Markov chain model of a software system may be used to compute all-pairs reaching probabilities to provide guidance in performing speculative operations with respect to the software system.
Abstract:
A data processing apparatus, a computer, an article including a machine-accessible medium, and a method of processing data are disclosed. The data processing apparatus may include a pair of pipelines sharing an instruction cache, data cache, and a branch predictor with the second pipeline running ahead of the first pipeline using a data value prediction module. The pipelines may be included in one or more processors and coupled to a memory to form a computer. The method includes executing a plurality of instructions using the pipeline pair, such that when a cache miss is encountered by the second pipeline during execution of a LOAD instruction, the data value prediction module supplies a predicted load value in lieu of a cached value, enabling continued execution of the plurality of instructions by the second pipeline without waiting for the return of the cached value.
Abstract:
A method and apparatus for enabling the speculative forking of a speculative thread is disclosed. In one embodiment, a speculative fork instruction is conditioned by the results of a fork predictor. The fork predictor may issue predictions as to whether or not a speculative thread would execute desirably. The fork predictor may be implemented as a modified branch predictor circuit, and may have execution history updates entered by a determination of whether or not the execution of a speculative thread was or would have been desirable.
Abstract:
A method and system to provide user-level multithreading are disclosed. The method according to the present techniques comprises receiving programming instructions to execute one or more shared resource threads (shreds) via an instruction set architecture (ISA). One or more instruction pointers are configured via the ISA; and the one or more shreds are executed simultaneously with a microprocessor, wherein the microprocessor includes multiple instruction sequencers.
Abstract:
Embodiments of systems, apparatuses, and methods for performing in a computer processor generation of a predicate mask based on vector comparison in response to a single instruction are described.
Abstract:
Embodiments of systems, apparatuses, and methods for performing in a computer processor generation of a predicate mask based on vector comparison in response to a single instruction are described.
Abstract:
Systems, apparatuses, and methods for a hardware and software system to automatically decompose a program into multiple parallel threads are described. In some embodiments, the systems and apparatuses execute a method of original code decomposition and/or generated thread execution.