Vector processor utilizing massively fused operations

    公开(公告)号:US12282774B2

    公开(公告)日:2025-04-22

    申请号:US17358231

    申请日:2021-06-25

    Abstract: Techniques are disclosed for the use of fused vector processor instructions by a vector processor architecture. Each fused vector processor instruction may include a set of fields associated with individual vector processing instructions. The vector processor architecture may implement local buffers facilitating a single vector processor instruction to be used to execute each of the individual vector processing instructions without re-accessing vector registers between each executed individual vector processing instruction. The vector processor architecture enables less communication across the interconnection network, thereby increasing interconnection network bandwidth and the speed of computations, and decreasing power usage.

    PROCESSOR EMBEDDED STREAMING BUFFER

    公开(公告)号:US20250103331A1

    公开(公告)日:2025-03-27

    申请号:US18971447

    申请日:2024-12-06

    Inventor: Joseph Williams

    Abstract: Techniques are disclosed for the use of local buffers integrated into the execution units of an array processor architecture. The use of local buffers results in less communication across the interconnection network implemented by processors, and increases interconnection network bandwidth, increases the speed of computations, and decreases power usage.

    FLEXIBLE VECTORIZED PROCESSING ARCHITECTURE
    3.
    发明公开

    公开(公告)号:US20240220249A1

    公开(公告)日:2024-07-04

    申请号:US18147099

    申请日:2022-12-28

    CPC classification number: G06F9/30036 G06F9/3001 G06F30/343

    Abstract: Techniques are disclosed for the implementation of a programmable processing array architecture that realizes vectorized processing operations for a variety of applications. Such vectorized processing operations may include digital front end (DFE) processing operations, which include finite impulse response (FIR) filter processing operations. The programmable processing array architecture provides a front-end interconnection network that generates specific data sliding time window patterns in accordance with the particular DFE processing operation to be executed. The architecture enables the processed data generated in accordance with these sliding time window patterns to be fed to a set of multipliers and adders to generate output data. The architecture supports a wide range of processing operations to be performed via a single programmable processing array platform by leveraging the programmable nature of the array and the use of instruction sets.

    VECTOR PROCESSOR SUPPORTING LINEAR INTERPOLATION ON MULTIPLE DIMENSIONS

    公开(公告)号:US20220197640A1

    公开(公告)日:2022-06-23

    申请号:US17131939

    申请日:2020-12-23

    Abstract: Techniques are disclosed for a vector processor architecture that enables data interpolation in accordance with multiple dimensions, such as one-, two-, and three-dimensional linear interpolation. The vector processor architecture includes a vector processor and accompanying vector addressable memory that enable a simultaneous retrieval of multiple entries in the vector addressable memory to facilitate linear interpolation calculations. The vector processor architecture vastly increases the speed in which such calculations may occur compared to conventional processing architectures. Example implementations include the calculation of digital pre-distortion (DPD) coefficients for use with radio frequency (RF) transmitter chains to support multi-band applications.

    Apparatuses, methods, and systems for a user defined formatting instruction to configure multicast Benes network circuitry

    公开(公告)号:US11334356B2

    公开(公告)日:2022-05-17

    申请号:US16457994

    申请日:2019-06-29

    Abstract: Systems, methods, and apparatuses relating to a user defined formatting instruction to configure multicast Benes network circuitry are described. In one embodiment, a processor includes a decoder to decode a single instruction into a decoded single instruction, the single instruction having fields that identify packed input data, packed control data, and a packed data destination; and an execution unit to execute the decoded single instruction to: send the packed control data to respective control inputs of a circuit that comprises an inverse butterfly circuit coupled in series to a butterfly circuit, wherein the inverse butterfly circuit comprises a first plurality of stages of multicast switches and the butterfly circuit comprises a second plurality of stages of multicast switches, read, once from storage separate from the circuit, each element of the packed input data as respective inputs of the circuit, route the packed input data through the circuit according to the packed control data, and store resultant packed data from the circuit into the packed data destination.

    PROGRAMMABLE PROCESSING ARRAY SUPPORTING MULTI-DIMENSIONAL INTERPOLATION COMPUTATIONS

    公开(公告)号:US20240134818A1

    公开(公告)日:2024-04-25

    申请号:US18533369

    申请日:2023-12-08

    CPC classification number: G06F15/8007 G06F1/03

    Abstract: Techniques are disclosed for a programmable processor architecture that enables data interpolation using an architecture that iteratively processes portions of a look-up table (LUT) in accordance with a fused single instruction stream, multiple data streams (SIMD) instruction. The LUT may contain segment entries that correspond to a result of evaluating a function using a corresponding index values, which represent an independent variable of the function. The index values are used to map data sample values in a data array that is to be interpolated to the segment entries. By using an iterative process of mapping data samples to valid segment entries contained in each LUT portion, the architecture advantageously facilitates scaling to support larger LUTs and thus may be expanded to enable linear interpolation on multiple dimensions.

    Apparatuses, methods, and systems for vector processor architecture having an array of identical circuit blocks

    公开(公告)号:US11074213B2

    公开(公告)日:2021-07-27

    申请号:US16457993

    申请日:2019-06-29

    Abstract: Systems, methods, and apparatuses relating to vector processor architecture having an array of identical circuit blocks are described. In one embodiment, a processor includes a single centralized circuit comprising an instruction decoder and a controller; and a plurality of circuit slices that each comprise an arithmetic logic unit, a multiplier, a register file, a local memory, and a same plurality of logic circuits and a packed data datapath in between, wherein each circuit slice includes a physical port that provides a unique identification value that identifies a circuit slice from the other circuit slices, and the controller is to broadcast a same configuration value to the plurality of circuit slices to cause a first circuit slice to enable a first logic circuit and enable a second logic circuit of the first circuit slice based on its unique identification value and the configuration value, and cause a second circuit slice to enable a same, first logic circuit and disable a same, second logic circuit of the second circuit slice based on its unique identification value and the configuration value.

    PROCESSING PIPELINE WITH ZERO LOOP OVERHEAD
    10.
    发明公开

    公开(公告)号:US20240345839A1

    公开(公告)日:2024-10-17

    申请号:US18647891

    申请日:2024-04-26

    Abstract: Techniques are disclosed for reducing or eliminating loop overhead caused by function calls in processors that form part of a pipeline architecture. The processors in the pipeline process data blocks in an iterative fashion, with each processor in the pipeline completing one of several iterations associated with a processing loop for a commonly-executed function. The described techniques leverage the use of message passing for pipelined processors to enable an upstream processor to signal to a downstream processor when processing has been completed, and thus a data block is ready for further processing in accordance with the next loop processing iteration. The described techniques facilitate a zero loop overhead architecture, enable continuous data block processing, and allow the processing pipeline to function indefinitely within the main body of the processing loop associated with the commonly-executed function where efficiency is greatest.

Patent Agency Ranking