Abstract:
An apparatus is described that includes a semiconductor chip having an instruction execution pipeline having one or more execution units with respective logic circuitry to: a) execute a first instruction that multiplies a first input operand and a second input operand and presents a lower portion of the result, where, the first and second input operands are respective elements of first and second input vectors; b) execute a second instruction that multiplies a first input operand and a second input operand and presents an upper portion of the result, where, the first and second input operands are respective elements of first and second input vectors; and, c) execute an add instruction where a carry term of the add instruction's adding is recorded in a mask register.
Abstract:
A math circuit for computing an estimate of a transcendental function is described. A lookup table storage circuit has stored therein several groups of binary values, where each group of values represents a respective coefficient of a first polynomial that estimates the function to a high precision. A first computing circuit uses a binary value from each group of values, to evaluate the first polynomial. A second computing circuit uses a portion of a binary value, that is also taken from one of the groups of values, to evaluate a second polynomial that estimates the function to a low precision. Other embodiments are also described and claimed.
Abstract:
The present invention provides a system and method for improving the performance of general-purpose processors by implementing a functional unit that computes the product of a matrix operand with a vector operand, producing a vector result. The functional unit fully utilizes the entire resources of a 128b by 128b multipliers regardsless of the operand size, as the number of elements of the matrix and vector operands increase as operand size is reduced. The unit performs both fixed-point and floating-point multiplications and additions with the highest-possible intermediate accuracy with modest resources.
Abstract:
In one embodiment, a dual mode execution unit is described for use in a general purpose digital signal processor (DSP). The execution unit can operate as a 16X16 multiplier in one mode and an 8-bit adder tree in another mode. The adder tree structure is constructed by reutilizing pre-existing arithmetic logic units (ALUs) in the multiplier array of the multiplier architecture. The 8-bit adder tree mode is particularly useful for performing various computation intensive algorithms used in digital video processing, such as motion search and spatial interpolation algorithms.
Abstract:
A method and system for processing graphics data in a computer system are disclosed. The method and system including providing a general-purpose processor and providing a vector co-processor compled with the general-purpose processor. The general-purpose processor includes an instruction queue for holding a plurality of instructions. The vector co-processor is for processing at least a portion of the graphics data using a portion of the plurality of instructions. The vector co-processor is capable of performing a plurality of mathematical operations in parallel. The plurality of instructions is provided using software written in a general-purpose programming language.
Abstract:
A data processing system is provided including an arithmetic logic unit (20, 22, 24) receiving input operands from M X-bit registers to produce output data words stored within N Y-bit registers, where M/N = 3,8