Abstract:
Receiving an instruction indicating a source operand and a destination operand. Storing a result in the destination operand in response to the instruction. The result operand may have: (1) first range of bits having a first end explicitly specified by the instruction in which each bit is identical in value to a bit of the source operand in a corresponding position; and (2) second range of bits that all have a same value regardless of values of bits of the source operand in corresponding positions. Execution of instruction may complete without moving the first range of the result relative to the bits of identical value in the corresponding positions of the source operand, regardless of the location of the first range of bits in the result. Execution units to execute such instructions, computer systems having processors to execute such instructions, and machine-readable medium storing such an instruction are also disclosed.
Abstract:
Receiving an instruction indicating a source operand and a destination operand. Storing a result in the destination operand in response to the instruction. The result operand may have: (1) first range of bits having a first end explicitly specified by the instruction in which each bit is identical in value to a bit of the source operand in a corresponding position; and (2) second range of bits that all have a same value regardless of values of bits of the source operand in corresponding positions. Execution of instruction may complete without moving the first range of the result relative to the bits of identical value in the corresponding positions of the source operand, regardless of the location of the first range of bits in the result. Execution units to execute such instructions, computer systems having processors to execute such instructions, and machine-readable medium storing such an instruction are also disclosed.
Abstract:
Methods, apparatus, systems and articles of manufacture are disclosed to eliminate partial-redundant vector loads. An example apparatus includes a node group to associate a vector operation with a node group based on a load type of the vector operation. The example apparatus also includes a candidate identifier to identify a candidate in the node group, the candidate to include a subset of vector operations of the node group. The example apparatus also includes a code optimizer to determine replacement code based on a characteristic of the candidate, and to compare an estimated cost associated with executing the replacement code to a threshold cost relative to a cost of executing the candidate. The example apparatus also includes a code generator to generate machine code using the replacement code when the estimated cost of executing the replacement code satisfies the threshold cost.
Abstract:
Methods, apparatus, systems and articles of manufacture are disclosed to eliminate partial-redundant vector loads. An example apparatus includes a node group to associate a vector operation with a node group based on a load type of the vector operation. The example apparatus also includes a candidate identifier to identify a candidate in the node group, the candidate to include a subset of vector operations of the node group. The example apparatus also includes a code optimizer to determine replacement code based on a characteristic of the candidate, and to compare an estimated cost associated with executing the replacement code to a threshold cost relative to a cost of executing the candidate. The example apparatus also includes a code generator to generate machine code using the replacement code when the estimated cost of executing the replacement code satisfies the threshold cost.
Abstract:
Receiving an instruction indicating a source operand and a destination operand. Storing a result in the destination operand in response to the instruction. The result operand may have: (1) first range of bits having a first end explicitly specified by the instruction in which each bit is identical in value to a bit of the source operand in a corresponding position; and (2) second range of bits that all have a same value regardless of values of bits of the source operand in corresponding positions. Execution of instruction may complete without moving the first range of the result relative to the bits of identical value in the corresponding positions of the source operand, regardless of the location of the first range of bits in the result. Execution units to execute such instructions, computer systems having processors to execute such instructions, and machine-readable medium storing such an instruction are also disclosed.
Abstract:
Methods, apparatus, systems and articles of manufacture are disclosed to eliminate partial-redundant vector load operations. An example apparatus includes a node grouper to associate a vector operation with a node group, a candidate verifier to perform a dependencies test on a subset of the node group, and identify a subset of the node group as a candidate when the subset satisfies the dependencies test, and a code optimizer to determine replacement code based on a characteristic of the candidate in the node group and compare an estimated cost associated with executing the replacement code to a threshold. The example apparatus also includes a code generator to generate machine code using the replacement code when the estimated cost of executing the replacement code satisfies the threshold.
Abstract:
Methods, apparatus, systems and articles of manufacture are disclosed to eliminate partial-redundant vector load operations. An example apparatus includes a node grouper to associate a vector operation with a node group, a candidate verifier to perform a dependencies test on a subset of the node group, and identify a subset of the node group as a candidate when the subset satisfies the dependencies test, and a code optimizer to determine replacement code based on a characteristic of the candidate in the node group and compare an estimated cost associated with executing the replacement code to a threshold. The example apparatus also includes a code generator to generate machine code using the replacement code when the estimated cost of executing the replacement code satisfies the threshold.