摘要:
Embodiments of the present invention are directed to an architecture structured to handle unaligned memory references. In one embodiment, a method for loading unaligned data stored in a plurality of memory locations comprises loading a first part of the unaligned data into a first storage location; rotating and sign-extending the first part of the unaligned data in the first storage location from a first position to a second position; loading a second part of the unaligned data into a second storage location; rotating the second part of the unaligned data in the second storage location from a third position to a fourth position; and combining the first storage location with the second location using a logical operation into a result storage location.
摘要:
A method and system for reducing a latency of microprocessor instructions in transit along an instruction pipeline of a microprocessor by bypassing, at certain times, a fill buffer located between an instruction source and a trace cache unit on the instruction pipeline. The signal path through the fill buffer to the trace cache unit represent a first signal path. In the instruction pipeline, a second signal path is also provided, one which also leads instructions to the trace cache unit, not through the fill buffer, but through a latch provided on the second instruction path. If the latch is enabled, a set of instructions appearing at the input of the fill buffer is transmitted through the latch along the second instruction path and to the trace cache. As a result, the fill buffer is bypassed and a reduced latency for the bypassed instructions is achieved along the instruction pipeline.
摘要:
An apparatus and method for bi-directional format conversion and transfer of data between integer and floating point registers is provided. A floating point register is configured to store floating point data, and integer data, in a variety of numerical formats. Data is moved in and out of the floating point register as integer data, and is converted into floating point format as needed. Separate processor instructions are provided for format conversion and data transfer to allow conversion and transfer operations to be separated.
摘要:
A barrier control scheme controls the order dependency of items in a multiple FIFO queue structure. The barrier control scheme includes a cycle ID generator, a barrier bit/barrier ID generator and a cycle ID and barrier ID comparator. Each incoming item to the FIFOs is assigned a cycle ID. If an incoming item of a first FIFO has order dependency on items of a second FIFO, a barrier bit is set and a barrier ID is determined and generated by the barrier bit/barrier ID generator. The barrier bit and barrier ID are inserted in the first FIFO along with other fields of the incoming item. When an item is to be consumed, the cycle ID and barrier ID comparator compares its barrier ID and the cycle IDs of items in the second FIFO. The item to be consumed is blocked until all items on which the item is dependent are consumed in the second FIFO.
摘要:
The present invention relates to a data processing unit comprising a register file, a register load and store buffer connected to the register file, a single memory, and a bus having at least first and second word lines to form a double word wide bus coupling the register load and store buffer with said single memory. The register file at least two sets of registers whereby the first set of registers can be coupled with one of the word lines and the second set of registers can be coupled with the respective other word lines, a load and store control unit for transferring data from or to the memory.
摘要:
A data processing system is provided with a digital signal processor which has an instruction for expanding one bit to form a mask field. In one form of the instruction, a first bit from a two-bit mask in a source operand is replicated and placed in an least significant half word of a destination operand while a second bit from the two-bit mask in the source operand is replicated and placed in a most significant half word of the destination operand. In another form of the instruction, a first bit from a four bit mask in a source operand is replicated and placed in a least significant byte of a destination operand, a second bit from the four-bit mask in the source operand is replicated and placed in a second least significant byte of the destination operand, a third bit from the four-bit mask is replicated and placed in a second most significant byte of the destination operand and a fourth bit form the four-bit mask is replicated and placed in a most significant byte of the destination operand.
摘要:
A computer architecture to process move instructions by allowing multiple mappings between logical registers and the same physical register. In one embodiment, a counter is associated with each physical register to indicate when the physical register is free. A register-to-register move instruction is processed by mapping the logical destination register of the move instruction to the same physical register to which the logical source register of the move instruction is mapped. An immediate-to-register move instruction is processed by mapping the logical destination register of the move instruction to a physical register storing the immediate.
摘要:
A method and an apparatus for configuring arbitrary sized data paths comprising multiple context processing elements (MCPEs) are provided. Multiple MCPEs may be chained to form wider-word data paths of arbitrary widths, wherein a first ALU serves as the most significant byte (MSB) of the data path while a second ALU serves as the least significant byte (LSB) of the data path. The ALUs of the data path are coupled using a left-going, or forward, carry chain for transmitting at least one carry bit from the LSB ALU to the MSB ALU. The MSB ALU comprises configurable logic for generating at least one signal in response to a carry bit received over the left-going carry chain, the at least one signal comprising a saturation signal and a saturation value. The MCPEs of the data path use configurable logic to manipulate a resident bit sequence in response to the saturation signal transmitted thereby reconfiguring, or changing the operation of, the data path in response to he saturation signal. The carry chains support carry operations for non-local functions comprising minimum and maximum arithmetic operations.
摘要:
Deposit and extract instructions include an opcode, a source address, a destination address, a shift number, and a K-bit mask string. The opcode describes the operations to be performed upon a J-bit source string and an N-bit destination string. The source address points to the memory location of the J-bit source string. The destination address points to the memory location of the N-bit destination string. The shift number indicates the number of bits the J-bit source string is to be shifted to generate a shifted bit string. The combination of the shifted bit string with the N-bit destination string is conducted under the control of the K-bit mask string. The invention is useful for high speed digital data processing, such as that performed by devices operating under the IEEE 1394 protocol.
摘要:
Disclosed herein is a apparatus and method for packing a 16-bit number into an 8-bit result byte. The method and apparatus utilize a parallel processing right shift circuit and a filter to obtain desired results. The parallel processes are comprised of a plurality of multiplexers capable of discretely analyzing smaller groups of bits. In this manner, higher throughput may be obtained than previously known.