摘要:
A technique for flexible scheduling of a code sequence wherein a set of instructions for determining a a fully-resolved predicate for each of a set of non-speculative instructions contained in the code sequence is generated. An optimized code sequence is then generated that includes the instructions for determining the fully resolved predicates and that further includes the non-speculative instructions each guarded by one of the fully resolved predicates such that any one of the non-speculative instructions may be executed before any other of the non-speculative instructions.
摘要:
A computer system provides fast evaluation of predicates and Boolean expressions with a set of operations for determining a value in a specified register from a plurality of inputs. The execution of each operation is defined by two functions of the operation's inputs: a result function which yields a result value, and an enable function which determines whether the result value is written to the specified register. To evaluate a Boolean expression with the operations, the register is preset to a Boolean value, e.g. one for an AND reduction, zero for an OR reduction. The operations can then write a Boolean value, e.g. zero for an AND reduction, one for an OR reduction, to the register if each operation's enable function evaluates true. The register then stores the correct value of the expression. The expression's value can be used as predicates to conditionally execute operations in a program. Preferably, the operations are executed in parallel by plural functional units, and the register is capable of accepting multiple values written simultaneously, so long as they are identical.
摘要:
A compiler technique for reducing the number of executed branches in a code sequence. Multiple condition branch instructions in a program sequence are replaced with a single combined conditional branch instruction thereby eliminating the time-consuming execution of multiple branch instructions by a target processor.
摘要:
An improved computer architecture and instruction set that reduces the delays produced by branch instructions. The invention utilizes a branch processor having a branch memory for storing information specifying a plurality of branch instructions that are contained in a code sequence. The branch memory stores information specifying the target address of each branch instruction and the location of the branch instruction with respect to the beginning of the code sequence. The branch processor receives the results of the various comparisons that determine if the conditions associated with the various branches stored in the branch memory are satisfied. The branch processor preferably stores the identity of the branch that is closed to the beginning of the code sequence for which the condition associated therewith has been satisfied. This branch will be referred to as the highest branch enabled. The actual branching operation is carded out in response to the receipt of an execute branch instruction which specifies one or more of the branches stored in the branch memory. If one of the branches specified in the execute branch instruction matches the highest branch enabled, then the code sequence continues at the target address of the highest branch enabled.