摘要:
A computer processor is disclosed. The computer processor may comprise a vector unit comprising a vector register file comprising at least one register to hold a varying number of elements. The computer processor may further comprise processing logic configured to operate on the varying number of elements in the vector register file using one or more instructions that separate a vector or combine two vectors. The computer processor may be implemented as a monolithic integrated circuit.
摘要:
In one embodiment, a system includes a processor having a vector processing mode and a multithreading mode. The processor is configured to operate on one thread per cycle in the multithreading mode. The processor includes a program counter register having a plurality of program counters, and the program counter register is vectorized. Each program counter in the program counter register represents a distinct corresponding thread of a plurality of threads. The processor is configured to execute the plurality of threads by activating the plurality of program counters in a round robin cycle.
摘要:
Techniques are provided for granular load and refresh of columnar data. In an embodiment, a particular data object that contains particular data formatted different from column-major format is maintained, the particular data including first data and second data. First and second data objects contain the first and second data, respectively, organized in the column-major format. In response to changes being committed to the first data in the particular data object, invalidating one or more rows of the first data object. In response to a number of invalidated rows of the first data object exceeding a threshold, automatically performing a refresh operation on the first data object independent of any refresh operation on the second data object.
摘要:
Embodiments relate to vector processors. An aspect includes endian-mode-sensitive memory instructions for a vector processor. One embodiment includes a computer-implemented method for copying data between a vector register that includes byte elements 0 to S and a memory that is byte addressable. The computer-implemented method includes obtaining a vector instruction by a processor in a computer. The processor determines that the vector instruction is a memory access instruction specifying the vector register and a memory address. In response to the determination that is instruction is a memory access instruction and independent of a current global endian mode setting that is selectable in the processor, the processor executes the memory access instruction by copying the byte data between the memory and the vector register so that the byte element n of the vector register corresponds to the memory address+n for n=0 to S.
摘要:
An aspect includes implementing endian-mode-sensitive memory instructions for a vector processor. One such system includes a byte addressable memory and a processor. The processor includes a register that includes a plurality of byte elements 0 to S. The system is configured to perform a method that includes obtaining an instruction by the processor and determining that the instruction is a memory access instruction specifying the register and a memory address. In response to the determination that the instruction is a memory access instruction and independent of a current global endian mode setting that is selectable in the processor, the memory access instruction is executed by copying the byte data between the memory and the register so that the byte element n of the register corresponds to the memory address+n for n=0 to S.
摘要:
An aspect includes accessing a vector register in a vector register file. The vector register file includes a plurality of vector registers and each vector register includes a plurality of elements. A read command is received at a read port of the vector register file. The read command specifies a vector register address. The vector register address is decoded by an address decoder to determine a selected vector register of the vector register file. An element address is determined for one of the plurality of elements associated with the selected vector register based on a read element counter of the selected vector register. A word is selected in a memory array of the selected vector register as read data based on the element address. The read data is output from the selected vector register based on the decoding of the vector register address by the address decoder.
摘要:
A vector slot processor that is capable of supporting multiple signal processing operations for multiple demodulation standards is provided. The vector slot processor includes a plurality of micro execution slot (MES) that performs the multiple signal processing operations on the high speed streaming inputs. Each of the MES includes one or more n-way signal registers that receive the high speed streaming inputs, one or more n-way coefficient registers that store filter coefficients for the multiple signal processing, and one or more n-way Multiply and Accumulate (MAC) units that receive the high speed streaming inputs from the one or more n-way signal registers and filter coefficients from one or more n-way coefficient registers. The one or more n-way MAC units perform a vertical MAC operation and a horizontal multiply and add operation on the high speed streaming inputs.
摘要:
A processor includes a scalar computation unit; a vector co-processor coupled to the scalar computation unit; and one or more function-specific engines coupled to the scalar computation unit, the engines adapted to minimize data exchange penalties by processing small in-out bit slices.
摘要:
A method for transferring data from a general purpose register to a vector register, the method including splatting a byte of data directly from a general purpose register (GPR) to a vector register (VR) by means of vector permute instructions, and splatting another byte of data from the GPR to the VR and vectorially combining the data in the VR.
摘要:
A system and method for vector memory array transposition. The system includes a vector memory, a block transposition accelerator, and an address controller. The vector memory stores a vector memory array. The block transposition accelerator reads a vector of a block of data within the vector memory array. The block transposition accelerator also writes a transposition of the vector of the block of data to the vector memory. The address controller determines a vector access order, and the block transposition accelerator accesses the vector of the block of data within the vector memory array according to the vector access order.