摘要:
A floating point execution unit calculates a one minus dot product value in a single pass. As such, the dependency that otherwise would be required to perform the calculations is eliminated, resulting in a substantially faster performance of such calculations. The floating point execution unit may be used, for example, to accelerate pixel shading algorithms such as Fresnel and electron microscope effects.
摘要:
An “early exit” of an iterative refinement algorithm is implemented by effectively disabling read after write dependency stalls of newer instructions, as well as disabling the register write enable of these instructions, for the remainder of the algorithm, in addition to disabling the register write enable of these instructions. By doing so, the latency of the algorithm is reduced and the performance is increased without the complexity and potential poor performance of compare and branch instructions that might otherwise be required.
摘要:
Persistent vector multiplexer control is used in a vector-based execution unit to control the shuffling of words in operand vectors processed by the execution unit. In addition, a persistent swizzle instruction is defined in an instruction set for the vector-based execution unit and is used to cause state information to be persisted such that the operand vectors processed by subsequent vector instructions executed by the vector-based execution unit will be selectively shuffled using the persisted state information. As a result, when multiple vector instructions require a common custom word ordering for one or more operand vectors, a single persistent swizzle instruction may be used to select the desired custom word ordering for all of the vector instructions.
摘要:
The present invention is generally related to the field of image processing, and more specifically to an instruction set for processing images. Vector processing may involve rearranging vector operands in one or more source registers prior to performing vector operations. Typically, rearranging of operands in source registers is done by issuing a plurality of permute instructions that require excessive usage of temporary registers. Furthermore, the permute instructions may cause dependencies between instructions executing in a pipeline, thereby adversely affecting performance. Embodiments of the invention provide a level of muxing between a register file and a vector unit that allow for rearrangement of vector operands in source registers prior to providing the operands to the vector unit, thereby obviating the need for permute instructions.
摘要:
A vectorizable execution unit is capable of being operated in a plurality of modes, with the processing lanes in the vectorizable execution unit grouped into different combinations of logical execution units in different modes. By doing so, processing lanes can be selectively grouped together to operate as different types of vector execution units and/or scalar execution units, and if desired, dynamically switched during runtime to process various types of instruction streams in a manner that is best suited for each type of instruction stream. As a consequence, a single vectorizable execution unit may be configurable, e.g., via software control, to operate either as a vector execution or a plurality of scalar execution units.
摘要:
A logic arrangement and method to support implied storage operation decode uses redundant target address detection, whereby target addresses of previous instructions are compared with the target address of the current instruction, and if equal, and the target addresses of previous instructions are not used as sources, the current instruction is decoded as a store instruction. This allows a redundant operation in an instruction set architecture to be redefined as a store instruction, freeing up opcodes normally used for store instructions to be used for other instructions.
摘要:
A method, program product and system for conducting a ray tracing operation where the rendering compute requirement is reduced or otherwise adjusted in response to a changing vantage point. Aspects may update or reuse an acceleration data structure between frames in response to the changing vantage point. Tree and image construction quality may be adjusted in response to rapid changes in the camera perspective. Alternatively or additionally, tree building cycles may be skipped. All or some of the tree structure may be built in intervals, e.g., after a preset number of frames. More geometric image data may be added per leaf node in the tree in response to an increase in the rate of change. The quality of the rendering algorithm may additionally be reduced. A ray tracing algorithm may decrease the depth of recursion, and generate fewer cast and secondary rays. The ray tracer may further reduce the quality of soft shadows, resolution and global illumination samples, among other quality parameters. Alternatively, tree rebuilding may be skipped entirely in response to a high camera rate of change. Associated processes may create blur between frames to simulate motion blur.
摘要:
A software-accessible special purpose register is architected into a processing unit in order to implement persistent vector multiplexer control of a vector-based execution unit. A persistent swizzle instruction is defined in an instruction set for the vector-based execution unit and is used to cause state information to be stored in the special purpose register such that the operand vectors processed by subsequent vector instructions executed by the vector-based execution unit will be selectively shuffled using the persisted state information. As a result, when multiple vector instructions require a common custom word ordering for one or more operand vectors, a single persistent swizzle instruction may be used to select the desired custom word ordering for all of the vector instructions.
摘要:
An approach to optimize specular highlight generation is presented. A single microprocessor instruction is used to generate an intensity value based upon a viewing angle value. An application stores a viewing angle value in an input register. When called, the “intensity instruction” retrieves the viewing angle value from the input register, and calculates an intensity value using three distinct steps. In turn, the intensity instruction stores the intensity value in an output register for the application to retrieve and further process. In one embodiment, the invention may be implemented using PowerPC™ assembly and VMX™ or Altivec™ instructions. In this embodiment, the intensity instruction may be represented as a “vspecefp” instruction, which stands for a “vector specular estimate floating point” instruction.
摘要:
A method, computer-readable medium, and an apparatus for implementing a floating point weighted average function. The method includes receiving an input containing 2N input values, 2N weights, and an opcode, where N is a positive integer number and each of the input values corresponds to one of the weights. Furthermore, the method also includes using existing dot product circuit function to generate 2N addends by multiplying each of the input values with the corresponding weight. In addition, the method includes generating a sum value by adding the 2N addends, where the sum value includes an exponent value, and generating the weighted average value based on the sum value by decreasing the exponent value by N. In this fashion, the same circuit area may be used to carry out both dot product and weighted average calculations, leading to greater circuit area savings and performance advantages.