摘要:
An apparatus for including in a processor a set of instructions that support operations on packed data required by typical multimedia applications. In one embodiment, the invention includes a processor having a storage area (150), a decoder (165), and a plurality of circuits (130). The plurality of circuits provide for the execution of a number of instructions to manipulate packed data. In this embodiment, these instructions include pack, unpack, packed multiply, packed add, packed subtract, packed compare, and packed shift.
摘要:
A method and apparatus for executing floating point and packed data instructions using a single physical register file that is aliased. According to one aspect of the invention, processor is provided that includes a decode unit (1002), a mapping unit (1004), and a storage unit (1006). The decode unit (1002) is configured to decode instructions and their operands from at least one instruction set including at least a first and second set of instructions. The storage unit (1006) includes a physical register file (1020). The mapping unit (1004) is configured to map operands used by the first set of instructions to the physical register file in a stock referenced manner. In addition, the mapping unit (1004) is configured to map operands used by the second set of instructions to the same physical register file in a non-stack reference manner.
摘要:
A processor includes a decoder (202) coupled to receive a control signal (207). The control signal has a first source address (602), a second source address (603), a destination address (605), and an operation field (601). The first source address corresponds to a first location, and the second source address corresponds to a second location. The destination address corresponds to a third location. The operation field indicates that a type of packed data compare operation is to be performed. The processor includes a circuit coupled to the decoder for comparing a first packed data being stored at the first location with a second packed data being stored at the second location and for communicating a corresponding result packed data to the third location.
摘要:
The power consumed within an integrated circuit (IC) is reduced by throttling the performance of particular functional units (105) within the IC. The recent utilization levels of particular functional units within an IC are monitored (108), for example, by computing each functional unit's average duty cycle over its recent operating history (106). If this activity level (109) is greater than a threshold, the functional unit is operated in a reduced-power mode (110). The threshold value is set large enough to allow short bursts of high utilization to occur. An IC can dynamically make the tradeoff between high-speed operation and low-power operation, by throttling back performance of functional units when their utilization exceeds a sustainable level. This dynamic power/speed tradeoff can be optimized across multiple functional units within an IC or among multiple ICs within a system. This dynamic power/speed tradeoff can be altered by providing software control over throttling parameters.
摘要:
A computer system which includes a multimedia input device which generates an audio or video input signal and a processor coupled to the multimedia input device. The system further includes a storage device coupled to the processor and having stored therein a signal processing routine for multiplying and accumulating input values representative of the audio or video input signal. The signal processing routine, when executed by the processor, causes the processor to perform several steps. These steps include performing a packed multiply add on a first set of values packed into a first source and a second set of values packed into a second source each representing input signals to generate a packed intermediate result. The packed intermediate result is added to an accumulator to generate a packed accumulated result in the accumulator. These steps may be iterated with the first set of values and portions of the second set of values to the accumulator to generate the packed accumulated result. Subsequent thereto, the packed accumulated result in the accumulator is unpacked into a first result and a second result and the first result and the second result are added together to generate an accumulated result.
摘要:
A computer system and method in which allocation of a cache memory (21a, 22a) is managed by utilizing a locality hint value (17, 18), included within an instruction (19), which controls if cache allocation is to be made. The locality value is based on spatial and/or temporal locality for a data access and may be assigned to each level of a cache hierarchy where allocation control is desired. The locality hint value may be used to identify a lowest level where management of cache allocation is desired and cache is allocated at that level and any higher level or levels. If the locality hint identifies a particular access for data as temporal or non-temporal with respect to a particular cache level, the particular access may be determined to be temporal or non-temporal with respect to the higher and lower cache levels.
摘要:
As shown in the Figure, a technique for controlling memory access ordering in a multi-processing system (11) in which a sequence of accesses to acquire, access and release a shared space of memory (15) is strictly adhered to by use of two specialized instructions for controlling memory (15) access. Two instructions noted as MFDA (Memory Fence Directional - Acquire) and MFDR (Memory Fence Directional - Release) are utilized to control the ordering. The MFDA instruction operates to ensure that all previous accesses to the specified address (typically to a lock controlling access to the shared space (15)) become visible to other processors before all future accesses are permitted. The MFDR instruction operates to ensure that all previous accesses become visible to other processors before any future accesses to the specified address.
摘要:
A processor includes a first register (209) for storing a first packed data, a decoder (202), and a functional unit (203). The decoder has a control signal input (207) for receiving a first control signal and a second control signal. The first control signal is for indicating a pack operation, and the second control signal is for indicating an unpack operation. The functional unit is coupled to the decoder (202) and the register (209). The functional unit performs the pack operation and the unpack operation using the first packed data as well as move operation.
摘要:
A processor having a first and second storage having a first and second packed data, respectively. Each packed data includes a first, second, third, and fourth data element. A multiply-add circuit is coupled to the first and second storage areas. The multiply-add circuit includes a first (810), second (811), third (812), and fourth multiplier (813), wherein each of the multipliers receives a corresponding set of said data elements. The multiply-add circuit further includes a first adder (850) coupled to the first and second multipliers (810, 811), and second adder (851) coupled to the third and fourth multipliers (812, 813). A third storage area (871) is coupled to the adders (850, 851). The third storage area (871) includes a first and second field for saving output of the first and second adders (850, 851), respectively, as first and second data elements of a third packed data.