Abstract:
Vector processing engines (VPEs) employing merging circuitry in data flow paths between execution units and vector data memory to provide in-flight merging of output vector data stored to vector data memory are disclosed. Related vector processing instructions, systems, and methods are also disclosed. Merging circuitry is provided in data flow paths between execution units and vector data memory in the VPE. The merging circuitry is configured to merge an output vector data sample set from execution units as a result of performing vector processing operations in-flight while the output vector data sample set is being provided over the output data flow paths from the execution units to the vector data memory to be stored. The merged output vector data sample set is stored in a merged form in the vector data memory without requiring additional post-processing steps, which may delay subsequent vector processing operations to be performed in execution units.
Abstract:
Vector processing engines (VPEs) employing reordering circuitry in data flow paths between execution units and vector data memory to provide in-flight reordering of output vector data stored to vector data memory are disclosed. Related vector processor systems and methods are also disclosed. Reordering circuitry is provided in data flow paths between execution units and vector data memory in the VPE. The reordering circuitry is configured to reorder output vector data sample sets from execution units as a result of performing vector processing operations in-flight while the output vector data sample sets are being provided over the data flow paths from the execution units to the vector data memory to be stored. In this manner, the output vector data sample sets are stored in the reordered format in the vector data memory without requiring additional post-processing steps, which may delay subsequent vector processing operations to be performed in the execution units.
Abstract:
Vector processing engines (VPEs) employing a tapped-delay line(s) for providing precision filter vector processing operations with reduced sample re-fetching and power consumption are disclosed. Related vector processor systems and methods are also disclosed. The VPEs are configured to provide filter vector processing operations. To minimize re-fetching of input vector data samples from memory to reduce power consumption, a tapped-delay line(s) is included in the data flow paths between a vector data file and execution units in the VPE. The tapped-delay line(s) is configured to receive and provide input vector data sample sets to execution units for performing filter vector processing operations. The tapped-delay line(s) is also configured to shift the input vector data sample set for filter delay taps and provide the shifted input vector data sample set to execution units, so the shifted input vector data sample set does not have to be re-fetched during filter vector processing operations.
Abstract:
Vector processing engines (VPEs) employing merging circuitry in data flow paths between execution units and vector data memory to provide in-flight merging of output vector data stored to vector data memory are disclosed. Related vector processing instructions, systems, and methods are also disclosed. Merging circuitry is provided in data flow paths between execution units and vector data memory in the VPE. The merging circuitry is configured to merge an output vector data sample set from execution units as a result of performing vector processing operations in-flight while the output vector data sample set is being provided over the output data flow paths from the execution units to the vector data memory to be stored. The merged output vector data sample set is stored in a merged form in the vector data memory without requiring additional post-processing steps, which may delay subsequent vector processing operations to be performed in execution units.
Abstract:
Apparatuses and techniques are disclosed herein that enable region-based cache management. In some aspects, a configuration for a region of cache memory is determined based on characteristics of information to be written to the cache memory. Based on the determined configuration, an address range of the cache memory is allocated to define the region within the cache memory. A cache policy is the applied to the allocated address range to control caching of the information written to the region of cache memory. By so doing, regions of cache memory and respective caching policies applied thereto can be optimized for a variety of information types or usages.
Abstract:
Techniques for fast power gating of vector processors are described herein. In one embodiment, a method for power gating a vector processor comprises powering up a vector unit from an inactive state at approximately a boundary of a transmission time interval, and powering down the vector unit within the transmission time interval after the vector unit completes a task within the transmission time interval.
Abstract:
Vector processing engines (VPEs) employing reordering circuitry in data flow paths between execution units and vector data memory to provide in-flight reordering of output vector data stored to vector data memory are disclosed. Related vector processor systems and methods are also disclosed. Reordering circuitry is provided in data flow paths between execution units and vector data memory in the VPE. The reordering circuitry is configured to reorder output vector data sample sets from execution units as a result of performing vector processing operations in-flight while the output vector data sample sets are being provided over the data flow paths from the execution units to the vector data memory to be stored. In this manner, the output vector data sample sets are stored in the reordered format in the vector data memory without requiring additional post-processing steps, which may delay subsequent vector processing operations to be performed in the execution units.
Abstract:
Vector processing engines (VPEs) employing a tapped-delay line(s) for providing precision filter vector processing operations with reduced sample re-fetching and power consumption are disclosed. Related vector processor systems and methods are also disclosed. The VPEs are configured to provide filter vector processing operations. To minimize re-fetching of input vector data samples from memory to reduce power consumption, a tapped-delay line(s) is included in the data flow paths between a vector data file and execution units in the VPE. The tapped-delay line(s) is configured to receive and provide input vector data sample sets to execution units for performing filter vector processing operations. The tapped-delay line(s) is also configured to shift the input vector data sample set for filter delay taps and provide the shifted input vector data sample set to execution units, so the shifted input vector data sample set does not have to be re-fetched during filter vector processing operations.
Abstract:
Systems and methods for performing on-the-fly format conversion on data vectors during load/store operations are described herein. In one embodiment, a method for loading a data vector from a memory into a vector unit comprises reading a plurality of samples from the memory, wherein the plurality of samples are packed in the memory. The method also comprises unpacking the samples to obtain a plurality of unpacked samples, performing format conversion on the unpacked samples in parallel, and sending at least a portion of the format-converted samples to the vector unit.
Abstract:
Various aspects describe an on-chip, hardware error-generator component. In some cases, the hardware error-generator component connects to a data path between two components contained within a same chip. Upon receiving an error simulation input, the hardware error-generator component modifies data being transmitted on the data path by inserting a data pattern that simulates an error condition. Alternately or additionally, the hardware error-generator randomly alters one or more of the transmitted data bits.