AN APPARATUS AND METHOD FOR TRANSFERRING A PLURALITY OF DATA STRUCTURES BETWEEN MEMORY AND A PLURALITY OF VECTOR REGISTERS
    1.
    发明申请
    AN APPARATUS AND METHOD FOR TRANSFERRING A PLURALITY OF DATA STRUCTURES BETWEEN MEMORY AND A PLURALITY OF VECTOR REGISTERS 审中-公开
    用于传输存储器和多个矢量寄存器之间的大量数据结构的装置和方法

    公开(公告)号:WO2017021678A1

    公开(公告)日:2017-02-09

    申请号:PCT/GB2016/051841

    申请日:2016-06-20

    Abstract: An apparatus and method are provided for transferring a plurality of data structures between memory and a plurality of vector registers, each vector register being arranged to store a vector operand comprising a plurality of data elements. Access circuitry is used to perform access operations to move data elements of vector operands between the data structures in memory and specified vector registers, each data structure comprising multiple data elements stored at contiguous addresses in the memory. Decode circuitry is responsive to a single access instruction identifying a plurality of vector registers and a plurality of data structures that are located discontiguously with respect to each other in the memory, to generate control signals to control the access circuitry to perform a sequence of access operations to move the plurality of data structures between the memory and the plurality of vector registers such that the vector operand in each vector register holds a corresponding data element from each of the plurality of data structures. This provides a very efficient mechanism for performing complex access operations, resulting in an increase in execution speed, and potential reductions in power consumption.

    Abstract translation: 提供了一种用于在存储器和多个向量寄存器之间传送多个数据结构的装置和方法,每个向量寄存器被布置为存储包括多个数据元素的向量操作数。 访问电路用于执行访问操作以在存储器和指定向量寄存器中的数据结构之间移动向量操作数的数据元素,每个数据结构包括存储在存储器中的连续地址处的多个数据元素。 解码电路响应于识别多个向量寄存器的单个访问指令和在存储器中相对于彼此无关位置的多个数据结构,以产生控制信号以控制访问电路执行一系列访问操作 以在存储器和多个向量寄存器之间移动多个数据结构,使得每个向量寄存器中的向量操作数保持来自多个数据结构中的每一个的相应数据元素。 这为执行复杂的访问操作提供了非常有效的机制,从而导致执行速度的提高以及潜在的功耗降低。

    VECTORIZATION OF COLLAPSED MULTI-NESTED LOOPS
    2.
    发明申请
    VECTORIZATION OF COLLAPSED MULTI-NESTED LOOPS 审中-公开
    收缩的多针鞋的展开

    公开(公告)号:WO2014105208A1

    公开(公告)日:2014-07-03

    申请号:PCT/US2013/048794

    申请日:2013-06-29

    Abstract: In an embodiment a method of vectorizing a collapsed multi-nested loop includes executing, in a vector unit of a processor, the collapsed loop to obtain a vector of offsets, including for each of a plurality of iterations, calculating a scalar offset into a multi-dimensional data structure, storing the scalar offset in a data element of a first vector register, and updating a loop counter value of a multi-dimensional loop counter vector. In turn, a plurality of data elements are loaded from the multi-dimensional data structure using a base value and indexes from the vector of offsets, at least one computation is performed on the loaded plurality of data elements to obtain a plurality of results, and the plurality of results are stored into the multi-dimensional data structure using the base value and the indexes from the vector of offsets. Other embodiments are described and claimed.

    Abstract translation: 在一个实施例中,向量化折叠多嵌套循环的方法包括在处理器的向量单元中执行折叠循环以获得偏移向量,包括对于多个迭代中的每一个,将标量偏移计算为多 将标量偏移存储在第一向量寄存器的数据元素中,以及更新多维循环计数器向量的循环计数器值。 接着,使用基本值从多维数据结构中加载多个数据元素,并从偏移矢量进行索引,对被加载的多个数据元素进行至少一次计算以获得多个结果,以及 使用基本值和来自偏移矢量的索引将多个结果存储到多维数据结构中。 描述和要求保护其他实施例。

    SOFTWARE AND HARDWARE COORDINATED PREFETCH
    3.
    发明申请
    SOFTWARE AND HARDWARE COORDINATED PREFETCH 审中-公开
    软件和硬件协调的前提

    公开(公告)号:WO2014101820A1

    公开(公告)日:2014-07-03

    申请号:PCT/CN2013/090652

    申请日:2013-12-27

    Abstract: Included is an apparatus comprising a processor configured to identify a code segment in a program, analyze the code segment to determine a memory access pattern, if the memory access pattern is regular, turn on hardware prefetching for the code segment by setting a control register before the code segment, and turn off the hardware prefetching by resetting the control register after the code segment. Also included is a method comprising identifying a code segment in a program, analyzing the code segment to determine a memory access pattern, if the memory access pattern is regular, turning on hardware prefetching for the code segment by setting a control register before the code segment, and turning off the hardware prefetching by resetting the control register after the code segment.

    Abstract translation: 包括一种装置,包括被配置为识别程序中的代码段的处理器,分析代码段以确定存储器访问模式,如果存储器访问模式是规则的,则通过在控制寄存器之前设置控制寄存器来打开对代码段的硬件预取 代码段,并通过在代码段之后重置控制寄存器来关闭硬件预取。 还包括一种方法,包括识别程序中的代码段,分析代码段以确定存储器访问模式,如果存储器访问模式是规则的,则通过在代码段之前设置控制寄存器来打开代码段的硬件预取 ,并通过在代码段之后复位控制寄存器来关闭硬件预取。

    ベクトル処理装置
    4.
    发明申请
    ベクトル処理装置 审中-公开
    矢量处理单元

    公开(公告)号:WO2008111500A1

    公开(公告)日:2008-09-18

    申请号:PCT/JP2008/054124

    申请日:2008-03-07

    Inventor: 星 宗王

    Abstract:  物量の増加を最小に抑えつつ、複数要素単位でバンク化されたメモリに対するベクトルストア命令を高速化することを目的とする。複数のレジスタバンクを有し、該複数のレジスタバンクに保持された複数のデータ要素からなるデータ列を処理の対象とするベクトル処理装置において、前記複数のレジスタバンクはそれぞれ、前記データ要素を読み出す際の読み出し位置を指し示す読み出しポインタ113を有し、レジスタバンクごとに読み出しポインタ113の開始位置を変える。また、レジスタバンクごとの読み出し開始位置として、例えば、レジスタバンクに付与された連続する番号を使用することができる。

    Abstract translation: 可以在抑制材料量的增加的同时,基于多个元素来增加存储体的矢量存储命令。 向量处理单元包括多个寄存器组,并处理由保存在寄存器组中的多个数据元素形成的数据串。 每个寄存器组都具有读出指针(113),其在读出数据元素时指定读出位置。 读取指针(113)的开始位置针对每个寄存器组而改变。 此外,例如,分配给寄存器组的连续数字可以用作各个寄存器组的读出开始位置。

    MICRO PROCESSOR DEVICE AND METHOD FOR SHUFFLE OPERATIONS
    5.
    发明申请
    MICRO PROCESSOR DEVICE AND METHOD FOR SHUFFLE OPERATIONS 审中-公开
    微处理器装置和方法操作简单

    公开(公告)号:WO2006033056A2

    公开(公告)日:2006-03-30

    申请号:PCT/IB2005053019

    申请日:2005-09-14

    Abstract: The present invention relates to a micro processor device comprising a vector processor architecture with a functional vector processor unit comprising first memory means for storing plural index vectors and processing means, the functional vector processor unit being arranged to receive a processing instruction and at least one input vector to be processed, said first memory means being arranged to provide the processing means with one of said plural index vectors in accordance with the processing instruction, and the processing means being arranged to generate in response to said instruction at least one output vector having the elements of the at least one input vector rearranged in accordance with the one index vector provided. The functional vector processor unit further comprises pre-processing means arranged to receive a parameter and to process the elements of the one index vector dependent on said parameter before generating said at least one output vector in accordance with the processed index vector. The invention further relates to a method for processing vectors with such a functional vector-processing unit.

    Abstract translation: 微处理器装置技术领域本发明涉及一种包括具有功能向量处理器单元的向量处理器架构的微处理器装置,所述功能向量处理器单元包括用于存储多个索引向量的第一存储装置和处理装置, 所述第一存储器装置被配置为根据处理指令向处理装置提供所述多个索引向量中的一个,并且处理装置被布置为响应于所述指令生成至少一个输出向量,其具有 所述至少一个输入向量的元素根据所提供的一个索引向量重新排列。 功能向量处理器单元还包括预处理装置,其被配置为接收参数并根据所处理的索引向量在产生所述至少一个输出向量之前处理依赖于所述参数的一个索引向量的元素。 本发明还涉及一种用这种功能向量处理单元处理向量的方法。

    RECONFIGURABLE PROCESSING SYSTEM AND METHOD
    6.
    发明申请
    RECONFIGURABLE PROCESSING SYSTEM AND METHOD 审中-公开
    可重构加工系统和方法

    公开(公告)号:WO0237264A3

    公开(公告)日:2003-05-01

    申请号:PCT/US0145810

    申请日:2001-11-02

    Applicant: BROADCOM CORP

    Abstract: A reconfigurable processing system executes instructions and configurations in parallel. Initially, a first instruction loads configurations into configuration registers. the configuration field of a subsequently fetched instruction selects a configuration register. The instruction controls and controls of the configuration in the selected configuration register are decoded and modified as specially by the instruction. The controls provide data operands to the execution units which process the operands and generate results. Scalar data, vector data, or a combination of scalar and vector data can be processed. The processing is controlled by instructions executed in parallel with configurations invoked by configuration fields within the instructions. Vectors are processed using a vector register file which stores vectors. A vector address unit identifies addresses of vector elements in the vector register file to be processed. For each vector, vector address units provide addresses which stride through each element of each vector.

    Abstract translation: 可重构处理系统并行执行指令和配置。 最初,第一条指令将配置加载到配置寄存器中。 随后取出的指令的配置字段选择配置寄存器。 所选择的配置寄存器中的配置的指令控制和控制由指令特别地进行解码和修改。 控件向处理操作数并生成结果的执行单元提供数据操作数。 可以处理标量数据,向量数据或标量和向量数据的组合。 处理由与指令中的配置字段调用的配置并行执行的指令控制。 使用存储向量的向量寄存器文件处理向量。 向量地址单元标识要处理的向量寄存器文件中的向量元素的地址。 对于每个向量,向量地址单元提供跨越每个向量的每个元素的地址。

    NON-INTEGRAL MULTIPLE SIZE ARRAY LOOP PROCESSING IN SIMD ARCHITECTURE
    8.
    发明申请
    NON-INTEGRAL MULTIPLE SIZE ARRAY LOOP PROCESSING IN SIMD ARCHITECTURE 审中-公开
    SIMD架构中的非整数多尺寸阵列循环处理

    公开(公告)号:WO02039271A1

    公开(公告)日:2002-05-16

    申请号:PCT/US2001/050029

    申请日:2001-11-09

    Abstract: A method of controlling the enabling of processor datapaths in a SIMD processor during a loop processing operation is described. The information used by the method includes an allocation between the data items and a memory (20), a size of the array, and a number of remaining parallel passes of the datapaths in the loop processing operation. A computer instruction (12) is also provided, which includes a loop handling instruction that specifies the enabling of one of a plurality of processor datapaths during processing an array of data items. The instruction includes a count field that specifies the number of remaining parallel loop passes to process the array and a count field that specifies the number of serial loop passes to process the array. Different instructions can be used to handle different allocations of passes to parallel datapaths. The instruction also uses information about the total number of datapaths (18).

    Abstract translation: 描述了在循环处理操作期间控制SIMD处理器中的处理器数据路径的使能的方法。 该方法使用的信息包括在循环处理操作中数据项和存储器(20)之间的分配,阵列的大小以及数据路径的剩余并行通过次数。 还提供了一种计算机指令(12),其包括在处理数据项数组期间指定多个处理器数据路径之一的使能的循环处理指令。 该指令包括一个计数字段,它指定用于处理数组的剩余并行循环次数以及一个指定串行循环遍数来处理数组的计数字段。 可以使用不同的指令来处理对并行数据路径的不同分配。 该指令还使用有关数据路径总数的信息(18)。

    RECIRCULATING REGISTER FILE
    9.
    发明申请
    RECIRCULATING REGISTER FILE 审中-公开
    重新记录寄存器文件

    公开(公告)号:WO99061997A1

    公开(公告)日:1999-12-02

    申请号:PCT/GB1999/000707

    申请日:1999-03-09

    Abstract: A floating point unit having a register bank containing a plurality of registers supports vector operations that execute a specified operation a plurality of times upon a sequence of data values form different registers. The register bank is divided into subsets and with the sequence of registers used in a vector operation wrapping within a subset. The subsets comprise disjoint, contiguous ranges of register numbers. The wrapping within ranges allows compact code and efficient to be provided for performing DSP operations, such as FIR filtering and matrix transformations.

    Abstract translation: 具有包含多个寄存器的寄存器组的浮点单元支持在数据值序列形成不同寄存器时多次执行指定操作的向量操作。 寄存器组被划分为子集以及在子集中的矢量操作中使用的寄存器序列。 子集包括不相交的寄存器编号的连续范围。 范围内的包装允许紧凑的代码和有效的提供用于执行DSP操作,如FIR滤波和矩阵转换。

    MIXED VECTOR/SCALAR REGISTER FILE
    10.
    发明申请
    MIXED VECTOR/SCALAR REGISTER FILE 审中-公开
    混合矢量/标量寄存器文件

    公开(公告)号:WO99061996A1

    公开(公告)日:1999-12-02

    申请号:PCT/GB1999/000701

    申请日:1999-03-09

    Abstract: A floating point unit is provided with a register bank comprising 32 registers that may be used as either vector registers of scalar registers. A data processing instruction includes at least one register specifying field pointing to a register containing a data value to be used in that operation. An increase in the instruction bit space available to encode more opcodes or to allow for more registers is provided by encoding whether a register is to be treated as a vector or a scalar within the register field itself. Further, the register field for one register of the instruction may encode whether another register is a vector or a scalar. The registers can be initially accessed using the values within the register fields of the instruction independently of the opcode allowing for easier decode.

    Abstract translation: 浮点单元设有一个寄存器组,包括32个寄存器,可用作标量寄存器的向量寄存器。 数据处理指令包括指向包含要在该操作中使用的数据值的寄存器的至少一个寄存器指定字段。 通过编码寄存器是否被视为寄存器字段本身中的向量或标量,可以提供可用于编码更多操作码或允许更多寄存器的指令位空间的增加。 此外,指令的一个寄存器的寄存器字段可以编码另一寄存器是矢量还是标量。 可以使用独立于操作码的指令的寄存器字段内的值来初始访问寄存器,从而更容易解码。

Patent Agency Ranking