Abstract:
Embodiments of the invention provide methods and apparatus for reallocating workload related to traversal of a ray through a spatial index. In a first operating state a workload manager may be experiencing a first or a normal workload. In the first operating state the workload manager may be responsible for traversing the entire spatial index and a vector throughput engine may be responsible for performing ray-primitive intersection tests. In an increased workload state the workload manager may experience an increased workload. In response to the increased workload the image processing system may partition the spatial index such that the workload manager may be responsible for traversing a first portion of the spatial index and the vector throughput engine may be responsible for traversing a second portion of the spatial index and for performing ray-primitive intersection tests.
Abstract:
Embodiments of the invention provide methods and apparatus for executing a multiple operand instruction. Executing the multiple operand instruction comprises computing an arithmetic result of a pair of operands in each processing lane of a vector unit. The arithmetic results generated in each processing lane of the vector unit may be transferred to a dot product unit. The dot product unit may compute an arithmetic result using the arithmetic result computed by each processing lane of the vector unit to generate an arithmetic result of more than two operands.
Abstract:
A floating point execution unit calculates a one minus dot product value in a single pass. As such, the dependency that otherwise would be required to perform the calculations is eliminated, resulting in a substantially faster performance of such calculations. The floating point execution unit may be used, for example, to accelerate pixel shading algorithms such as Fresnel and electron microscope effects.
Abstract:
An “early exit” of an iterative refinement algorithm is implemented by effectively disabling read after write dependency stalls of newer instructions, as well as disabling the register write enable of these instructions, for the remainder of the algorithm, in addition to disabling the register write enable of these instructions. By doing so, the latency of the algorithm is reduced and the performance is increased without the complexity and potential poor performance of compare and branch instructions that might otherwise be required.
Abstract:
Persistent vector multiplexer control is used in a vector-based execution unit to control the shuffling of words in operand vectors processed by the execution unit. In addition, a persistent swizzle instruction is defined in an instruction set for the vector-based execution unit and is used to cause state information to be persisted such that the operand vectors processed by subsequent vector instructions executed by the vector-based execution unit will be selectively shuffled using the persisted state information. As a result, when multiple vector instructions require a common custom word ordering for one or more operand vectors, a single persistent swizzle instruction may be used to select the desired custom word ordering for all of the vector instructions.
Abstract:
A circuit arrangement and method bypass the storage of requested data in a higher level cache of a multi-level memory architecture during the return of the requested data to a requester, while caching the requested data in a lower level cache. For certain types of data, e.g., data that is only used once and/or that is rarely modified or written back to memory, bypassing storage in a higher level cache reduces the likelihood of the requested data casting out frequently used data from the higher level cache. By caching the data in a lower level cache, however, the lower level cache can still snoop data requests and return requested data in the event the data is already cached in the lower level cache.
Abstract:
The present invention is generally related to the field of image processing, and more specifically to an instruction set for processing images. Vector processing may involve rearranging vector operands in one or more source registers prior to performing vector operations. Typically, rearranging of operands in source registers is done by issuing a plurality of permute instructions that require excessive usage of temporary registers. Furthermore, the permute instructions may cause dependencies between instructions executing in a pipeline, thereby adversely affecting performance. Embodiments of the invention provide a level of muxing between a register file and a vector unit that allow for rearrangement of vector operands in source registers prior to providing the operands to the vector unit, thereby obviating the need for permute instructions.
Abstract:
A vectorizable execution unit is capable of being operated in a plurality of modes, with the processing lanes in the vectorizable execution unit grouped into different combinations of logical execution units in different modes. By doing so, processing lanes can be selectively grouped together to operate as different types of vector execution units and/or scalar execution units, and if desired, dynamically switched during runtime to process various types of instruction streams in a manner that is best suited for each type of instruction stream. As a consequence, a single vectorizable execution unit may be configurable, e.g., via software control, to operate either as a vector execution or a plurality of scalar execution units.
Abstract:
A logic arrangement and method to support implied storage operation decode uses redundant target address detection, whereby target addresses of previous instructions are compared with the target address of the current instruction, and if equal, and the target addresses of previous instructions are not used as sources, the current instruction is decoded as a store instruction. This allows a redundant operation in an instruction set architecture to be redefined as a store instruction, freeing up opcodes normally used for store instructions to be used for other instructions.
Abstract:
A software-accessible special purpose register is architected into a processing unit in order to implement persistent vector multiplexer control of a vector-based execution unit. A persistent swizzle instruction is defined in an instruction set for the vector-based execution unit and is used to cause state information to be stored in the special purpose register such that the operand vectors processed by subsequent vector instructions executed by the vector-based execution unit will be selectively shuffled using the persisted state information. As a result, when multiple vector instructions require a common custom word ordering for one or more operand vectors, a single persistent swizzle instruction may be used to select the desired custom word ordering for all of the vector instructions.