Abstract:
A hazard check instruction has operands that specify addresses of vector elements to be read by first and second vector memory operations. The hazard check instruction outputs a dependency vector identifying, for each element position of the first vector corresponding to the first vector memory operation, which element position of the second vector that the element of the first vector depends on (if any). In an embodiment, the addresses of the vector memory operations are specified using a base address for each vector memory operation and a vector that is shared by both vector memory operations. In an embodiment, the operands may include predicates for one or both of the vector memory operations, indicating which vector elements are active. The dependency vector may be qualified by the predicates, indicating dependencies only for active elements.
Abstract:
In an embodiment, a processor may include a completion time determination circuit. The completion time determination circuit may be configured to receive one or more source operands of a vector memory operation used to produce the addresses of the vector elements accessed by the vector memory operation. The completion time determination circuit may be configured to determine a completion time for the vector memory operation (e.g. based on a number of TLB accesses, a number of cache accesses, and/or other aspects of the vector memory operation). The completion time determination circuit may provide the completion time to an issue circuit, which may use the completion time to schedule operations dependent on the vector memory operation, if any.
Abstract:
In an embodiment, a processor may include a completion time determination circuit. The completion time determination circuit may be configured to receive one or more source operands of a vector memory operation used to produce the addresses of the vector elements accessed by the vector memory operation. The completion time determination circuit may be configured to determine a completion time for the vector memory operation (e.g. based on a number of TLB accesses, a number of cache accesses, and/or other aspects of the vector memory operation). The completion time determination circuit may provide the completion time to an issue circuit, which may use the completion time to schedule operations dependent on the vector memory operation, if any.
Abstract:
In an embodiment, a processor may be configured to dynamically infer one or more attributes of input and/or output registers of an instruction, given the attributes corresponding to at least one input registers. The inference may be made at the issue circuit/stage of the processor, for those registers that do not have attribute information at the issue circuit/stage. In an embodiment, the processor may also include a register attribute tracker configured to track attributes of registers prior to the issue stage of the processor pipeline. The processor may feed back, to the register attribute tracker, inferred attributes and the register addresses of the registers to which the inferred attributes apply. The register attribute tracker may be configured to may associate the inferred attribute with the identified register attribute tracker may also be configured to infer input register attributes from other input register attributes.
Abstract:
In an embodiment, a processor includes a register attribute tracker configured to track one or more attributes corresponding to registers. The register attribute tracker may track the attributes associated with the registers when those registers are used as output registers of instructions that explicitly define the attributes and, if the register attribute tracker has a tracked attribute associated with an input register of an instruction that does not explicitly define the attribute, the register attribute tracker may annotate the instruction with an attribute and/or associate an attribute with the output register of the instruction in the register attribute tracker.
Abstract:
System and methods for the parallelization of software applications are described. In some embodiments, a compiler may automatically identify within source code dependencies of a function called by another function. A persistent database may be generated to store identified dependencies. When calls the function are encountered within the source code, the persistent database may be checked, and a parallelized implementation of the function may be employed dependent upon the dependency indicated in the persistent database.
Abstract:
Systems, apparatuses and methods for utilizing enhanced vector true/false instructions. The enhanced vector true/false instructions generate enhanced predicates to correspond to the request element width and/or vector size. A vector true instruction generates an enhanced predicate where all elements supported by the processing unit are active. A vector false instruction generates an enhanced predicate where all elements supported by the processing unit are inactive. The enhanced predicate specifies the requested element width in addition to designating the element selectors.
Abstract:
Systems, apparatuses and methods for utilizing enhanced Macroscalar comparison operations which take an enhanced predicate operand that designates the element width and which elements are to be processed. The element width and the number of elements per vector are determined at run-time rather than being defined in the architectural definition of the instruction. This enables additional parallelism when processing smaller-sized data. The instruction performs the requested operation on the elements specified by the enhanced predicate, assuming an element-width also specified by the enhanced predicate, and returns the result as an enhanced predicate corresponding to the result of the comparison.
Abstract:
Embodiments of a system and a method in which a processor may execute instructions that cause the processor to receive an input vector and a control vector are disclosed. The executed instructions may also cause the processor to perform a Boolean operation on another input vector dependent upon the input vector and the control vector.
Abstract:
Embodiments of a system and a method in which a processor may execute instructions that cause the processor to receive a basis vector, an operand vector, a selection vector, and a control vector are disclosed. The executed instructions may also cause the processor to perform a wrapping propagate operation dependent upon the input vectors.