Abstract:
A method for sorting of a vector in a processor is provided that includes performing, by the processor in response to a vector sort instruction, sorting of values stored in lanes of the vector to generate a sorted vector, wherein the values are sorted in an order indicated by the vector sort instruction, and storing the sorted vector in a storage location.
Abstract:
A method is provided that includes performing, by a processor in response to a floating point multiply instruction, multiplication of floating point numbers, wherein determination of values of implied bits of leading bit encoded mantissas of the floating point numbers is performed in parallel with multiplication of the encoded mantissas, and storing, by the processor, a result of the floating point multiply instruction in a storage location indicated by the floating point multiply instruction.
Abstract:
This invention is a digital signal processor form plural sums of absolute values (SAD) in a single operation. An operational unit performing a sum of absolute value operation comprising two sets of a plurality of rows, each row producing a SAD output. Plural absolute value difference units receive corresponding packed candidate pixel data and packed reference pixel data. A row summer sums the output of the absolute value difference units in the row. The candidate pixels are offset relative to the reference pixels by one pixel for each succeeding row in a set of rows. The two sets of rows operate on opposite halves of the candidate pixels packed within an instruction specified operand. The SAD operations can be performed on differing data widths employing carry chain control in the absolute difference unit and the row summers.
Abstract:
An integrated circuit, comprising an instruction pipeline that includes instruction fetch phase circuitry, instruction decode phase circuitry, and instruction execution circuitry. The instruction execution circuitry includes transformation circuitry for receiving an interleaved dual vector operand as an input and for outputting a first natural order vector including a first set of data values from the interleaved dual vector operand and a second natural order vector including a second set of data values from the interleaved dual vector operand.
Abstract:
An example device includes a first register storing a first vector comprised of a set of vector elements; a second register having a set of lanes and configured to store a second vector; and a storage that stores a set of control elements. Each such control element corresponds to a respective one of the vector elements of the set of vector elements in the first register. In addition, each control element of the set of control elements has a first portion that specifies, for the corresponding vector element of the set of vector elements, a lane of the set of lanes of the second register, and a second portion that specifies whether the corresponding vector element of the set of vector elements is to be routed to the lane specified by the first portion. The example device further includes processing circuitry to, based on an instruction that specifies the first register and the second register, generate the second vector based on the set of control elements.
Abstract:
Devices and methods are provided for performing, by a processor in response to a floating point multiply instruction, multiplication of floating point numbers. In an example, a device includes a processor that includes a multiply circuit. The multiply circuit is configured to multiply floating point numbers in response to a floating point multiply instruction, and is further configured to determine values of implied bits of mantissas of the floating point numbers, and multiply the mantissas in parallel with the determining operation.
Abstract:
A method is provided that includes performing, by a processor in response to a vector permutation instruction, permutation of values stored in lanes of a vector to generate a permuted vector, wherein the permutation is responsive to a control storage location storing permute control input for each lane of the permuted vector, wherein the permute control input corresponding to each lane of the permuted vector indicates a value to be stored in the lane of the permuted vector, wherein the permute control input for at least one lane of the permuted vector indicates a value of a selected lane of the vector is to be stored in the at least one lane, and storing the permuted vector in a storage location indicated by an operand of the vector permutation instruction.
Abstract:
A processor is provided that includes a first multiplication unit in a first data path of the processor, the first multiplication unit configured to perform single issue multiply instructions, and a second multiplication unit in the first data path, the second multiplication unit configured to perform single issue multiply instructions, wherein the first multiplication unit and the second multiplication unit are configured to execute respective single issue multiply instructions in parallel.
Abstract:
A processor is provided that includes a first multiplication unit in a first data path of the processor, the first multiplication unit configured to perform single issue multiply instructions, and a second multiplication unit in the first data path, the second multiplication unit configured to perform single issue multiply instructions, wherein the first multiplication unit and the second multiplication unit are configured to execute respective single issue multiply instructions in parallel.
Abstract:
A method is provided that includes performing, by a processor in response to a vector sort instruction, sorting of values stored in lanes of the vector to generate a sorted vector, wherein the values in a first portion of the lanes are sorted in a first order indicated by the vector sort instruction and the values in a second portion of the lanes are sorted in a second order indicated by the vector sort instruction; and storing the sorted vector in a storage location.