摘要:
Instruction methods for moving data between memory and a vector register file while performing data formatting. The methods are processed by a processor having a vector register file and a memory unit. The methods are useful in the graphics art because they allow more efficient movement and processing of raster formatted graphics data. The vector register file has a number of vector registers (e.g., 32) that each contain multi-bits of storage (e.g., 128 bits). In one class of instructions, eight byte locations within memory are simultaneously loaded into eight separate 16 bit locations within a register of the register file. The data can be integer or fraction and signed or unsigned. The data can also be stored from the register file back to memory. In a second class of instructions, alternate locations of a memory qaudword are selected and simultaneously loaded in the register file. In a third class, data is obtained across a word boundary by a first instruction that obtains a first part and a second instruction that obtains the remainder part crossing the boundary. In a last class of instruction transfers, a block (e.g., 8 16-bit.times.8 16-bit) of data is loaded from memory, stored in the register file and stored back into memory causing a transposition of the data block (16 cycles). A block (e.g., 8 16-bit.times.8 16-bit) of data is stored from the register file to memory, and loaded back into the register file causing a transposition of the data block (16 cycles).
摘要:
A method and arrangement for separating interleaved luminance and chrominance color space components data in a single data stream with minimum CPU intervention is provided. In the separating circuit, the separating circuit receives as input a series of graphics/video image data composed of interleaved luminance and chrominance color space components at successive clock cycles. The separating circuit directs selected bytes of the graphics/video image data representing the luminance color space component to a first path wherein luminance component data received at two successive clock cycles are combined. Likewise, selected bytes of the graphics/video image data representing the chrominance color space component are directed to a second path wherein chrominance component data received at two successive clock cycles are combined. Then, the combined luminance and chrominance component data are output alternately. Conversely, a method and arrangement for interleaving luminance and chrominance color space components data in stored separately into a single data stream is also provided.
摘要:
The present invention provides alignment and ordering of vector elements for SIMD processing. In the alignment of vector elements for SIMD processing, one vector is loaded from a memory unit into a first register and another vector is loaded from the memory unit into a second register. The first vector contains a first byte of an aligned vector to be generated. Then, a starting byte specifying the first byte of an aligned vector is determined. Next, a vector is extracted from the first register and the second register beginning from the first bit in the first byte of the first register continuing through the bits in the second register. Finally, the extracted vector is replicated into a third register such that the third register contains a plurality of elements aligned for SIMD processing. In the ordering of vector elements for SIMD processing, a first vector is loaded from a memory unit into a first register and a second vector is loaded from the memory unit into a second register. Then, a subset of elements are selected from the first register and the second register. The elements from the subset are then replicated into the elements in the third register in a particular order suitable for subsequent SIMD vector processing.
摘要:
The present invention provides alignment and ordering of vector elements for SIMD processing. In the alignment of vector elements for SIMD processing, one vector is loaded from a memory unit into a first register and another vector is loaded from the memory unit into a second register. The first vector contains a first byte of an aligned vector to be generated. Then, a starting byte specifying the first byte of an aligned vector is determined. Next, a vector is extracted from the first register and the second register beginning from the first bit in the first byte of the first register continuing through the bits in the second register. Finally, the extracted vector is replicated into a third register such that the third register contains a plurality of elements aligned for SIMD processing. In the ordering of vector elements for SIMD processing, a first vector is loaded from a memory unit into a first register and a second vector is loaded from the memory unit into a second register. Then, a subset of elements are selected from the first register and the second register. The elements from the subset are then replicated into the elements in the third register in a particular order suitable for subsequent SIMD vector processing.
摘要:
The present invention provides alignment and ordering of vector elements for SIMD processing. In the alignment of vector elements for SIMD processing, one vector is loaded from a memory unit into a first register and another vector is loaded from the memory unit into a second register. The first vector contains a first byte of an aligned vector to be generated. Then, a starting byte specifying the first byte of an aligned vector is determined. Next, a vector is extracted from the first register and the second register beginning from the first bit in the first byte of the first register continuing through the bits in the second register. Finally, the extracted vector is replicated into a third register such that the third register contains a plurality of elements aligned for SIMD processing. In the ordering of vector elements for SIMD processing, a first vector is loaded from a memory unit into a first register and a second vector is loaded from the memory unit into a second register. Then, a subset of elements are selected from the first register and the second register. The elements from the subset are then replicated into the elements in the third register in a particular order suitable for subsequent SIMD vector processing.
摘要:
The present invention provides alignment and ordering of vector elements for SIMD processing. In the alignment of vector elements for SIMD processing, one vector is loaded from a memory unit into a first register and another vector is loaded from the memory unit into a second register. The first vector contains a first byte of an aligned vector to be generated. Then, a starting byte specifying the first byte of an aligned vector is determined. Next, a vector is extracted from the first register and the second register beginning from the first bit in the first byte of the first register continuing through the bits in the second register. Finally, the extracted vector is replicated into a third register such that the third register contains a plurality of elements aligned for SIMD processing. In the ordering of vector elements for SIMD processing, a first vector is loaded from a memory unit into a first register and a second vector is loaded from the memory unit into a second register. Then, a subset of elements are selected from the first register and the second register. The elements from the subset are then replicated into the elements in the third register in a particular order suitable for subsequent SIMD vector processing.
摘要:
The present invention provides alignment and ordering of vector elements for SIMD processing. In the alignment of vector elements for SIMD processing, one vector is loaded from a memory unit into a first register and another vector is loaded from the memory unit into a second register. The first vector contains a first byte of an aligned vector to be generated. Then, a starting byte specifying the first byte of an aligned vector is determined. Next, a vector is extracted from the first register and the second register beginning from the first bit in the first byte of the first register continuing through the bits in the second register. Finally, the extracted vector is replicated into a third register such that the third register contains a plurality of elements aligned for SIMD processing. In the ordering of vector elements for SIMD processing, a first vector is loaded from a memory unit into a first register and a second vector is loaded from the memory unit into a second register. Then, a subset of elements are selected from the first register and the second register. The elements from the subset are then replicated into the elements in the third register in a particular order suitable for subsequent SIMD vector processing.
摘要:
The present invention provides extended precision in SIMD arithmetic operations in a processor having a register file and an accumulator. A first set of data elements and a second set of data elements are loaded into first and second vector registers, respectively. Each data element comprises N bits. Next, an arithmetic instruction is fetched from memory. The arithmetic instruction is decoded. Then, the first vector register and the second vector register are read from the register file. The present invention executes the arithmetic instruction on corresponding data elements in the first and second vector registers. The resulting element of the execution is then written into the accumulator. Then, the resulting element is transformed into an N-bit width element and written into a third register for further operation or storage in memory. The transformation of the resulting element can include, for example, rounding, clamping, and/or shifting the element.
摘要:
The present invention provides extended precision in SIMD arithmetic operations in a processor having a register file and an accumulator. A first set of data elements and a second set of data elements are loaded into first and second vector registers, respectively. Each data element comprises N bits. Next, an arithmetic instruction is fetched from memory. The arithmetic instruction is decoded. Then, the first vector register and the second vector register are read from the register file. The present invention executes the arithmetic instruction on corresponding data elements in the first and second vector registers. The resulting element of the execution is then written into the accumulator. Then, the resulting element is transformed into an N-bit width element and written into a third register for further operation or storage in memory. The transformation of the resulting element can include, for example, rounding, clamping, and/or shifting the element.
摘要:
A method and arrangement for a dma transfer mode having multiple transactions is provided. The invention generates a set of transaction entries for a DMA transfer each of which contains information related to the address and command instruction of a transaction. The transaction entries are stored in an address/cmd-output-FIFO. The invention negotiates for the control of the system bus. Upon gaining control of the bus, the commands and address relate to each transaction are sequentially place on the system bus. If the transaction is a read operation, data received back from the system bus is first stored in a data-in-FIFO before being sent to the desired destination. If the transaction is a write operation, the data to be transferred is first stored in a data-out-FIFO before being timely place on the system bus for transferring to the desired destination. In either case, the number of data words transferred is monitored to determine when a transaction is complete. The number of transactions carried out is also monitored to determine when a DMA transfer is complete.