Abstract:
A processor includes a front end including a decoder to decode a branch instruction to perform a branch operation. The processor includes a loop stream unit with logic to identify from the branch instruction that the branch operation is a loop operation, determine whether the loop operation will include a fixed or effectively-infinite number of iterations, load decoded instructions of a loop iteration of the loop operation, and cyclically issue the decoded instructions of the loop iteration in a manner based upon whether the loop operation will include a fixed or effectively-infinite number of iterations. The processor also includes an execution unit to execute the branch instruction and a retirement unit including to retire the branch instruction.
Abstract:
Embodiments of a method and apparatus for implementing and maintaining a stack of predicate values with stack synchronization instructions. In one embodiment the apparatus is an out of order hardware/software co-designed processor including instructions to explicitly manage the predicate register stack to maintain stack consistency across branches of executing that push a variable number of predicate values onto the predicate stack. In one embodiment the stack-based predicate register implementation enables early branch calculation and early branch misprediction recovery via early renaming of predicate registers.