Abstract:
The vector data path is divided into smaller vector lanes. The number of active vector lanes is controllable on the fly by the programmer to match the requirements of the executing program, and inactive vector lanes are powered down by the CPU to increase power efficiency of the vector processor.
Abstract:
A method of loading and duplicating scalar data from a source into a destination register. The data may be duplicated in byte, half word, word or double word parts, according to a duplication pattern.
Abstract:
A method of storing register data elements to interleave with data elements of a different register, a processor thereof, and a system thereof, wherein each non-consecutive data elements of a register is retrieved to be stored to interleave with each non-consecutive data elements of a different register upon an executive of an interleaving store instruction, wherein a mask instruction directing a lane of a storage space in which the non-consecutive data elements are stored is executed in conjunction with the interleaving store instruction, and wherein a processor of a second type is configured to emulate a processor of a first type to store the non-consecutive data elements the same as non-consecutive data elements stored in the first type processor.
Abstract:
A digital signal processor having at least one streaming address generator, each with dedicated hardware, for generating addresses for writing multi-dimensional streaming data that comprises a plurality of elements. Each at least one streaming address generator is configured to generate a plurality of offsets to address the streaming data, and each of the plurality of offsets corresponds to a respective one of the plurality of elements. The address of each of the plurality of elements is the respective one of the plurality of offsets combined with a base address.
Abstract:
In an embodiment, a circuit includes a data path including at least a first lane of a first width and a second lane of a second, larger, width; an execution unit to execute a first instruction on data of the first width or less using the first lane, and to execute a second instruction on data greater than the first width and less than or equal to the second width using the second lane; and a control register that stores a value indicating which of the first and second lanes to be used in instruction execution by the execution unit. The circuit is configured to, based on the value stored in the control register, power off the first lane when the execution unit executes the second instruction but not the first instruction, and power off the second lane when the execution unit executes the first instruction but not the second instruction.
Abstract:
In an example, a device includes a register file; a set of functional units coupled to the register file; and an instruction decoder coupled to the register file and to the set of functional units. The instruction decoder receives an executable instruction directed to a specific functional unit of the set of functional unit. The executable instruction includes a segment specifying a register of the register file. The instruction decoder also provides the executable instruction to the specific functional unit. The specific functional unit then determines whether to execute the executable instruction based on a value stored in the register of the register file specified by the segment of the executable instruction.
Abstract:
In a method of operating a computer system, an instruction loop is executed by a processor in which each iteration of the instruction loop accesses a current data vector and an associated current vector predicate. The instruction loop is repeated when the current vector predicate indicates the current data vector contains at least one valid data element and the instruction loop is exited when the current vector predicate indicates the current data vector contains no valid data elements.
Abstract:
In a method of operating a computer system, an instruction loop is executed by a processor in which each iteration of the instruction loop accesses a current data vector and an associated current vector predicate. The instruction loop is repeated when the current vector predicate indicates the current data vector contains at least one valid data element and the instruction loop is exited when the current vector predicate indicates the current data vector contains no valid data elements.
Abstract:
A method of storing register data elements to interleave with data elements of a different register, a processor thereof, and a system thereof, wherein each non-consecutive data elements of a register is retrieved to be stored to interleave with each non-consecutive data elements of a different register upon an executive of an interleaving store instruction, wherein a mask instruction directing a lane of a storage space in which the non-consecutive data elements are stored is executed in conjunction with the interleaving store instruction, and wherein a processor of a second type is configured to emulate a processor of a first type to store the non-consecutive data elements the same as non-consecutive data elements stored in the first type processor.
Abstract:
A method is shown that is operable to transform and align a plurality of fields from an input to an output data stream using a multilayer butterfly or inverse butterfly network. Many transformations are possible with such a network which may include separate control of each multiplexer. This invention supports a limited set of multiplexer control signals, which enables a similarly limited set of data transformations. This limited capability is offset by the reduced complexity of the multiplexor control circuits.