摘要:
This invention is a digital signal processor form plural sums of absolute values (SAD) in a single operation. An operational unit performing a sum of absolute value operation comprising two sets of a plurality of rows, each row producing a SAD output. Plural absolute value difference units receive corresponding packed candidate pixel data and packed reference pixel data. A row summer sums the output of the absolute value difference units in the row. The candidate pixels are offset relative to the reference pixels by one pixel for each succeeding row in a set of rows. The two sets of rows operate on opposite halves of the candidate pixels packed within an instruction specified operand. The SAD operations can be performed on differing data widths employing carry chain control in the absolute difference unit and the row summers.
摘要:
A circuit arrangement and program product provide support for packed sum of absolute difference operations in a floating point execution unit, e.g., a scalar or vector floating point execution unit. Existing adders in a floating point execution unit may be utilized along with minimal additional logic in the floating point execution unit to support efficient execution of a fixed point packed sum of absolute differences instruction within the floating point execution unit, often eliminating the need for a separate vector fixed point execution unit in a processor architecture, and thereby leading to less logic and circuit area, lower power consumption and lower cost.
摘要:
Embodiments of systems, apparatuses, and methods for performing in a computer processor vector double block packed sum of absolute differences (SAD) in response to a single vector double block packed sum of absolute differences instruction that includes a destination vector register operand, first and second source operands, an immediate, and an opcode are described.
摘要:
A logic circuit is configured to calculate a sliding sum of absolute differences of a plurality of numbers from a plurality of members respectively selected successively from all members of a sequence of numbers. The logic circuit reduces an amount of logic that is required to perform the sum of absolute differences, and thereby saves resources and latency.
摘要:
Embodiments of a near optimal configurable adder tree for arbitrary shaped 2D block sum of absolute differences (SAD) calculation engine are generally described herein. Other embodiments may be described and claimed. In some embodiments, a configurable two-dimensional adder tree architecture for computing a sum of absolute differences (SAD) for various block sizes up to 16 by 16 comprises a first stage of one-dimensional adder trees and a second stage of one-dimensional adder trees, wherein each one-dimensional adder tree comprises an input routing network, a plurality of adder units, and an output routing network.
摘要:
Computing an absolute difference includes receiving a first value and a second value. Propagate terms are determined according to the first value and the second value at one or more adders (24). The second value is subtracted from the first value using the propagate terms to yield a subtraction difference. It is determined at one or more correctors (26) whether the subtraction difference is negative. If the subtraction difference is negative, the subtraction difference is modified according to the propagate terms to compute an absolute difference between the first value and the second value. Otherwise, the subtraction difference is reported as the absolute difference between the first value and the second value.
摘要:
The proposed hardware architecture is integrated onto a Digital Signal Processor (DSP) as a coprocessor to assist in the computation of sum of absolute differences, symmetrical row/column Finite Impulse Response (FIR) filtering with a downsampling (or upsampling) option, row/column Discrete Cosine Transform (DCT)/Inverse Discrete Cosine Transform (IDCT), and generic algebraic functions. The architecture is called IPP, which stands for image processing peripheral, and consists of 8 hardware multiply-accumulate units connected in parallel and routed and multiplexed together. The architecture can be dependent upon a Direct Memory Access (DMA) controller to retrieve and write back data from/to DSP memory without intervention from the DSP core. The DSP can set up the DMA transfer and IPP/DMA synchronization in advance, then go on its own processing task. Alternatively, the DSP can perform the data transfers and synchronization itself by synchronizing with the IPP architecture on these transfers. This hardware architecture implements 2-D filtering, symmetrical filtering, short filters, sum of absolute differences, and mosaic decoding more quickly(in terms of clock cycles) and efficiently than previously disclosed architectures of the prior art which perform the same operations in software.
摘要:
A method and apparatus that adds each one of multiple elements of a packed data together to produce a result. According to one such a method and apparatus, each of a first set of portions of partial products is produced using a first set of partial product selectors in a multiplier, each of the first set of portions of the partial products being zero. Each of the multiple elements is inserted into one of a second set of portions of the partial products using a second set of partial product selectors, each of the second set of portions of the partial products being aligned. Each of the multiple elements are added together to produce the result including a field having the sum of the multiple elements.
摘要:
A pipelined data path architecture for use, in one embodiment, in a multimedia processor. The data path architecture requires a maximum of two execution pipestages to perform all instructions including wide data format multiply instructions and specially adapted multimedia instructions, such as the sum of absolute differences (SABD) instruction and other multiply with add (MADD) instructions. The data path architecture includes two wide data format input registers that feed four partitioned 32×32 multiplier circuits. Within two pipestages, the multiply circuit can perform one 128×128 multiply operation, four 32×32 multiply operations, eight 16×16 multiply operations or sixteen 8×8 multiply operations in parallel. The multiply circuit contains a compressor tree which generates a 256-bit sum and a 256-bit carry vector. These vectors are supplied to four 64-bit carry propagate adder circuits which generate the multiply results. When the data path architecture is performing specially adapted multimedia instructions the input registers are supplied to a pipelined logic unit containing adders, subtractors, shifters, average/round/absolute value circuits, and other logic operation circuits, compressor circuits and multiplexers. The output of the pipelined logic unit is then fed to the four 64-bit carry propagate adder circuits. In this way, the adder circuits of the multiply operation can be effectively used to also process the specially adapted multimedia instructions thereby saving IC area. Multiply circuitry is disabled to save power when the data path architecture is not processing a multiplication instruction.
摘要:
There is disclosed a compact arithmetic circuit for obtaining an absolute-valued distance (AVD) between two digital data. An inverter 11 generates the data * .beta. which is derived from the data .beta. as the 1's complement thereof. An ALU 12 adds the data .alpha. and the data * .beta. when a carry data C1 is inputted thereto. An inverter 13 generates a new carry data C2 from the add result of the ALU 12 and feeds it back to the ALU 12. Each bit of the add result by the ALU 12 is inverted by an inverter 15 and is given to a selector 14. After the initial add operation, the ALU 12 adds the data .alpha. and the data * .beta. by using carry data C2 as the input thereto. A selector 14 selects either the first add result by the ALU 12, the output data from the inverter 15 corresponding to the first add result, the second add result by the ALU 12, or the output data of the inverter 15 corresponding to the second add result, and outputs the selected as a correct AVD.