Abstract:
Embodiments described herein utilize restricted transactional memory (RTM) instructions to implement speculative compile time optimizations that will be automatically rolled back by hardware in the event of a missed speculation. In one embodiment, a lightweight version of RTM for speculative compiler optimization is described to provide lower operational overhead in comparison to conventional RTM implementations used when performing SLE.
Abstract:
Loop vectorization methods and apparatus are disclosed. An example method includes setting a dynamic adjustment value of a vectorization loop; executing the vectorization loop to vectorize a loop by grouping iterations of the loop into one or more vectors; identifying a dependency between iterations of the loop as; and setting the dynamic adjustment value based on the identified dependency.
Abstract:
Methods and systems to convert a scalar computer program loop having loop-carried dependences into a vector computer program loop are disclosed. One such method includes, at runtime, identifying, by executing an instruction with one or more processors, a first loop iteration that cannot be executed in parallel with a second loop iteration due to a set of conflicting scalar loop operations. The first loop iteration is executed after the second loop iteration. The method also includes sectioning, by executing an instruction with one or more processors, a vector loop into vector partitions including a first vector partition. The first vector partition executes consecutive loop iterations in parallel and the consecutive loop iterations start at the second loop iteration and end before the first loop iteration.
Abstract:
Embodiments of systems, apparatuses, and methods for performing in a computer processor generation of a predicate mask based on vector comparison in response to a single instruction are described.
Abstract:
Embodiments described herein utilize restricted transactional memory (RTM) instructions to implement speculative compile time optimizations that will be automatically rolled back by hardware in the event of a missed speculation. In one embodiment, a lightweight version of RTM for speculative compiler optimization is described to provide lower operational overhead in comparison to conventional RTM implementations used when performing SLE.
Abstract:
Loop vectorization methods and apparatus are disclosed. An example method includes prior to executing an original loop having iterations, analyzing, via a processor, the iterations of the original loop, identifying a dependency between a first one of the iterations of the original loop and a second one of the iterations of the original loop, after identifying the dependency, vectorizing a first group of the iterations of the original loop based on the identified dependency to form a vectorization loop, and setting a dynamic adjustment value of the vectorization loop based on the identified dependency.
Abstract:
Methods and systems to convert scalar computer program loops having loop carried dependences to vector computer program loops are disclosed. One example method and system generates a first predicate set associated with a first conditionally executed statement. The first predicate set contains a first set of predicates that cause a variable to be defined in a scalar computer program loop at or before the variable is defined by the first conditionally executed statement. The method and system also generates a second predicate set associated with the first conditionally executed statement. The second predicate set contains a second set of predicates that cause the variable to be used in the scalar computer program loop at or before the variable is defined by the first conditionally executed statement. The method and system determines whether the second predicate set is a subset of the first predicate set and, based on the determination, propagates a vector value in an element of a vector of the variable to a subsequent element of the vector.
Abstract:
An apparatus and method are described for detecting and responding to fault conditions in a processor. For example, one embodiment of a method comprises: reading each active element in succession from a first vector register, each active element specifying an address for a gather or load operation; detecting one or more fault conditions associated with one or more of the active elements; for each active element read in succession prior to a detected fault condition on an element other than the first active element, storing the data loaded from an address associated with the active element in a first output vector register; and for each active element associated with the detected fault condition and following the detected fault condition, setting a bit in an output mask register to indicate the detected fault condition.