摘要:
The data processing system loads three input operands, including two input vectors and a control vector, into vector registers and performs a permutation of the two input vectors as specified by the control vector, and further stores the result of the operation as the output operand in an output register. The control vector consists of sixteen indices, each uniquely identifying a single byte of input data in either of the input registers, and can be specified in the operational code or be the result of a computation previously performed within the vector registers. The control vector is specified by calculating the offset of a selected vector element of the input vector relative to a base address of the input vector and loading each element with an index equal to the relative offset. Alternatively, the generation of the alignment vector is made by performing a look-up within a look-up table. For additional loads from the same vector, the control vector does not change, since the alignment shift amount of the vector from an address boundary does not change. A permutation instruction can then be executed to load and shift the data to realign it in the output register at the vector boundary.
摘要:
A vector multiplication mechanism is provided that partitions vector multiplication operation into even and odd paths. In an odd path, odd data elements of first and second source vectors are selected, and multiplication operation is performed between each of the selected odd data elements of the first source vector and corresponding one of the selected odd data elements of the second source vector. In an even path, even data elements of the source vectors are selected, and multiplication operation is performed between each of the selected even data elements of the first source vector and corresponding one of the selected even data elements of the second source vector. Elements of resultant data of the two paths are merged together in a merge operation. The vector multiplication mechanism of the present invention preferably uses a single general-purpose register to store the resultant data of the odd path and the even path. In addition, computational overhead of the merge operation is amortized over a series of vector operations.
摘要:
A data processing system includes a data processor (10) coupled to a memory system having a first memory, such as an L1 data cache (16), arranged with a second memory (such as an L2 cache) at a lower hierarchical level. The data processor (10) prefetches data elements of a vector into the first memory prior to processing such data elements. If a requested data element is not present in the first memory, a load request is issued to the second memory and to lower levels of the memory hierarchy until the requested data element is finally retrieved and stored in the first memory. The data processor (10) continues to prefetch subsequent data elements of the vector by considering the length of the data element and the stride of the vector. In one embodiment, the data processor (10) prefetches the vector into the first memory in response to a single data stream touch load (DST) instruction (100).
摘要:
The invention relates to a method of using a “bounds” comparator scheme and to a “bounds” comparator circuit. The method of using this scheme or comparator circuit allows a quick and easy test to characterize, utilizing a single floating-point bounds comparison function, the location of a point with respect to pre-defined end- points. The single floating-point bounds comparison function represents an additional instruction to be incorporated within computer instruction set architectures when performing trivial acceptance testing during the generation of three-dimensional images or graphics.
摘要:
A method and system is disclosed which summarizes the results of a classical single-instruction multiple-data SIMD predicate comparison operation, signaling whether all comparisons resulted in a false result or true result, and placing that status into a separate status register, such as the Power PC Condition Register. The method and system utilizes first and second status bits to support the signaling whether all element comparisons resulted in true or false. The first status bit is set when all element comparisons resulted in false (i.e. a NOR of all predicate comparison results), and the second status bit is set when all element comparisons resulted in true (i.e. an AND of all predicate comparison results). This capability allows control flow using conditional branching on the event when all comparison results are false or when all comparison results are true. The method and system of the present invention is useful in 3-D graphics such as lighting and trivial acceptance testing where executing down both paths of a branch and then selecting the correct result is not tolerable.
摘要:
The data processing system of the present invention loads three input operands, including two input vectors and a control vector, into vector registers and performs a permutation of the two input vectors as specified by the control vector, and further stores the result of the operation as the output operand in an output register. The control vector consists of sixteen indices, each uniquely identifying a single byte of input data in either of the input registers, and can be specified in the operational code or be the result of a computation previously performed within the vector registers. The specification of the control vector allows a vector-matrix operation to be performed on the input vectors by rearranging or replicating the input operand bytes in the bytes of the output register as a function of the control vector. This system provides a highly efficient register loading mechanism for data vectors misaligned in memory, and allows the computation of a serially dependent chain of binary functions within the vector registers.