Abstract:
Methods and systems to convert a scalar computer program loop having loop-carried dependences into a vector computer program loop are disclosed. One such method includes, replacing the scalar recurrence operation in the scalar computer program loop with a first vector summing operation and a first vector recurrence operation. The first vector summing operation is to generate a first running sum and the first vector recurrence operation is to generate a first vector. In some examples, the first vector recurrence operation is based on the scalar recurrence operation. Disclosed methods also include inserting: 1) a renaming operation to rename the first vector, 2) a second vector summing operation that is to generate a second running sum; and 3) a second vector recurrence operation to generate a second vector based on the renamed first vector.
Abstract:
A vector reduction instruction with non-unit strided access pattern is received and executed by the execution circuitry of a processor. In response to the instruction, the execution circuitry performs an associative reduction operation on data elements of a first vector register. Based on values of the mask register and a current element position being processed, the execution circuitry sequentially sets one or more data elements of the first vector register to a result, which is generated by the associative reduction operation applied to both a previous data element of the first vector register and a data clement of a third vector register. The previous data element is located more than one element position away from the current element position.
Abstract:
Loop vectorization methods and apparatus are disclosed. An example method includes prior to executing an original loop having iterations, analyzing, via a processor, the iterations of the original loop, identifying a dependency between a first one of the iterations of the original loop and a second one of the iterations of the original loop, after identifying the dependency, vectorizing a first group of the iterations of the original loop based on the identified dependency to form a vectorization loop, and setting a dynamic adjustment value of the vectorization loop based on the identified dependency.
Abstract:
Methods and systems to convert a scalar computer program loop having loop-carried dependences into a vector computer program loop are disclosed. One such method includes, replacing the scalar recurrence operation in the scalar computer program loop with a first vector summing operation and a first vector recurrence operation. The first vector summing operation is to generate a first running sum and the first vector recurrence operation is to generate a first vector. In some examples, the first vector recurrence operation is based on the scalar recurrence operation. Disclosed methods also include inserting: 1) a renaming operation to rename the first vector, 2) a second vector summing operation that is to generate a second running sum; and 3) a second vector recurrence operation to generate a second vector based on the renamed first vector.
Abstract:
A computer-implemented method for managing loop code in a compiler includes using a conflict detection procedure that detects across-iteration dependency for arrays of single memory addresses to determine whether a potential across-iteration dependency exists for arrays of memory addresses for ranges of memory accessed by the loop code.
Abstract:
Methods and systems to convert a scalar computer program loop having loop-carried dependences into a vector computer program loop are disclosed. One such method includes, at runtime, identifying, by executing an instruction with one or more processors, a first loop iteration that cannot be executed in parallel with a second loop iteration due to a set of conflicting scalar loop operations. The first loop iteration is executed after the second loop iteration. The method also includes sectioning, by executing an instruction with one or more processors, a vector loop into vector partitions including a first vector partition. The first vector partition executes consecutive loop iterations in parallel and the consecutive loop iterations start at the second loop iteration and end before the first loop iteration.
Abstract:
Technologies for automatic loop vectorization include a computing device with an optimizing compiler. During an optimization pass, the compiler identifies a loop and generates a transactional code segment including a vectorized implementation of the loop body including one or more vector memory read instructions capable of generating an exception. The compiler also generates a non-transactional fallback code segment including a scalar implementation of the loop body that is executed in response to an exception generated within the transactional code segment. The compiler may detect whether the loop contains a memory read dependent on a condition that may be updated in a previous iteration or whether the loop contains a potential data dependence between two iterations. The compiler may generate a dynamic check for an actual data dependence and an explicit transactional abort instruction to be executed when an actual data dependence exists. Other embodiments are described and claimed.
Abstract:
An apparatus and method for speculative vectorization. For example, one embodiment of a processor comprises: a queue comprising a set of locations for storing addresses associated with vectorized memory access instructions; and execution logic to execute a first vectorized memory access instruction to access the queue and to compare a new address associated with the first vectorized memory access instruction with existing addresses stored within a specified range of locations within the queue to detect whether a conflict exists, the existing addresses having been previously stored responsive to one or more prior vectorized memory access instructions.
Abstract:
A computer-implemented method for managing loop code in a compiler includes using a conflict detection procedure that detects across-iteration dependency for arrays of single memory addresses to determine whether a potential across-iteration dependency exists for arrays of memory addresses for ranges of memory accessed by the loop code.
Abstract:
Loop vectorization methods and apparatus are disclosed. An example method includes setting a dynamic adjustment value of a vectorization loop; executing the vectorization loop to vectorize a loop by grouping iterations of the loop into one or more vectors; identifying a dependency between iterations of the loop as; and setting the dynamic adjustment value based on the identified dependency.