SYSTEMS AND METHODS OF DATA EXTRACTION IN A VECTOR PROCESSOR
    42.
    发明申请
    SYSTEMS AND METHODS OF DATA EXTRACTION IN A VECTOR PROCESSOR 有权
    在矢量处理器中数据提取的系统和方法

    公开(公告)号:US20140059323A1

    公开(公告)日:2014-02-27

    申请号:US13592617

    申请日:2012-08-23

    IPC分类号: G06F15/76

    摘要: Systems and methods of data extraction in a vector processor are disclosed. In a particular embodiment a method of data extraction in a vector processor includes copying at least one data element to a source register of a permutation network. The method includes reordering multiple data elements of the source register, populating a destination register of the permutation network with the reordered data elements, and copying the reordered data elements from the destination register to a memory.

    摘要翻译: 公开了一种向量处理器中数据提取的系统和方法。 在特定实施例中,向量处理器中的数据提取方法包括将至少一个数据元素复制到置换网络的源寄存器。 该方法包括重新排序源寄存器的多个数据元素,用重新排序的数据元素填充置换网络的目的地寄存器,以及将重新排序的数据元素从目的地寄存器复制到存储器。

    VECTOR REGISTER FILE
    43.
    发明申请
    VECTOR REGISTER FILE 审中-公开
    矢量寄存器文件

    公开(公告)号:US20140047211A1

    公开(公告)日:2014-02-13

    申请号:US13572886

    申请日:2012-08-13

    IPC分类号: G06F15/76

    摘要: An aspect includes accessing a vector register in a vector register file. The vector register file includes a plurality of vector registers and each vector register includes a plurality of elements. A read command is received at a read port of the vector register file. The read command specifies a vector register address. The vector register address is decoded by an address decoder to determine a selected vector register of the vector register file. An element address is determined for one of the plurality of elements associated with the selected vector register based on a read element counter of the selected vector register. A word is selected in a memory array of the selected vector register as read data based on the element address. The read data is output from the selected vector register based on the decoding of the vector register address by the address decoder.

    摘要翻译: 一个方面包括访问向量寄存器文件中的向量寄存器。 向量寄存器文件包括多个向量寄存器,并且每个向量寄存器包括多个元素。 在向量寄存器文件的读端口处接收到读命令。 读命令指定向量寄存器地址。 向量寄存器地址由地址解码器解码,以确定向量寄存器文件的选定向量寄存器。 基于所选择的向量寄存器的读元素计数器,确定与所选向量寄存器相关联的多个元素之一的元素地址。 在所选向量寄存器的存储器阵列中选择一个字作为基于元素地址的读取数据。 基于由地址解码器对向量寄存器地址的解码,从所选向量寄存器输出读取数据。

    Complex matrix multiplication operations with data pre-conditioning in a high performance computing architecture
    44.
    发明授权
    Complex matrix multiplication operations with data pre-conditioning in a high performance computing architecture 失效
    在高性能计算架构中使用数据预处理的复矩阵乘法运算

    公开(公告)号:US08650240B2

    公开(公告)日:2014-02-11

    申请号:US12542324

    申请日:2009-08-17

    IPC分类号: G06F7/52

    摘要: Mechanisms for performing a complex matrix multiplication operation are provided. A vector load operation is performed to load a first vector operand of the complex matrix multiplication operation to a first target vector register. The first vector operand comprises a real and imaginary part of a first complex vector value. A complex load and splat operation is performed to load a second complex vector value of a second vector operand and replicate the second complex vector value within a second target vector register. The second complex vector value has a real and imaginary part. A cross multiply add operation is performed on elements of the first target vector register and elements of the second target vector register to generate a partial product of the complex matrix multiplication operation. The partial product is accumulated with other partial products and a resulting accumulated partial product is stored in a result vector register.

    摘要翻译: 提供了执行复矩阵乘法运算的机制。 执行矢量加载操作以将复矩阵乘法运算的第一向量操作数加载到第一目标向量寄存器。 第一矢量操作数包括第一复矢量值的实部和虚部。 执行复杂的加载和拼接操作以加载第二向量操作数的第二复数向量值,并在第二目标向量寄存器内复制第二复数向量值。 第二个复矢量值具有实部和虚部。 对第一目标向量寄存器的元素和第二目标向量寄存器的元素执行交叉乘法运算,以生成复矩阵乘法运算的部分乘积。 部分产品与其他部分产品一起累积,并将结果积累的部分产品存储在结果向量寄存器中。

    EFFICIENT HARDWARE INSTRUCTIONS FOR SINGLE INSTRUCTION MULTIPLE DATA PROCESSORS
    45.
    发明申请
    EFFICIENT HARDWARE INSTRUCTIONS FOR SINGLE INSTRUCTION MULTIPLE DATA PROCESSORS 有权
    高效的硬件指令,用于单个指令多个数据处理器

    公开(公告)号:US20140013078A1

    公开(公告)日:2014-01-09

    申请号:US14023265

    申请日:2013-09-10

    IPC分类号: G06F9/38

    摘要: A method and apparatus for efficiently processing data in various formats in a single instruction multiple data (“SIMD”) architecture is presented. Specifically, a method to unpack a fixed-width bit values in a bit stream to a fixed width byte stream in a SIMD architecture is presented. A method to unpack variable-length byte packed values in a byte stream in a SIMD architecture is presented. A method to decompress a run length encoded compressed bit-vector in a SIMD architecture is presented. A method to return the offset of each bit set to one in a bit-vector in a SIMD architecture is presented. A method to fetch bits from a bit-vector at specified offsets relative to a base in a SIMD architecture is presented. A method to compare values stored in two SIMD registers is presented.

    摘要翻译: 提出了一种在单指令多数据(“SIMD”)结构中有效处理各种格式的数据的方法和装置。 具体地说,提出了一种在SIMD架构中将位流中的固定宽度位值解包为固定宽度字节流的方法。 介绍了一种解码SIMD架构中字节流中可变长度字节打包值的方法。 提出了一种在SIMD架构中解压缩运行长度编码的压缩位向量的方法。 提出了一种在SIMD架构中将位设置为1的偏移量返回到位向量中的方法。 提出了一种从SIMD架构中相对于基址指定的偏移量从位向量获取位的方法。 提出了一种比较存储在两个SIMD寄存器中的值的方法。

    INSTRUCTION AND LOGIC TO PERFORM DYNAMIC BINARY TRANSLATION
    47.
    发明申请
    INSTRUCTION AND LOGIC TO PERFORM DYNAMIC BINARY TRANSLATION 有权
    指令和逻辑执行动态二进制翻译

    公开(公告)号:US20130283249A1

    公开(公告)日:2013-10-24

    申请号:US13995400

    申请日:2011-09-30

    IPC分类号: G06F9/45

    摘要: A micro-architecture may provide a hardware and software co-designed dynamic binary translation. The micro-architecture may invoke a method to perform a dynamic binary translation. The method may comprise executing original software code compiled targeting a first instruction set, using processor hardware to detect a hot spot in the software code and passing control to a binary translation translator, determining a hot spot region for translation, generating the translated code using a second instruction set, placing the translated code in a translation cache, executing the translated code from the translated cache, and transitioning back to the original software code after the translated code finishes execution.

    摘要翻译: 微架构可以提供硬件和软件协同设计的动态二进制翻译。 微架构可以调用执行动态二进制转换的方法。 该方法可以包括执行针对第一指令集编译的原始软件代码,使用处理器硬件来检测软件代码中的热点并将控制传递给二进制翻译翻译器,确定用于翻译的热点区域,使用 第二指令集,将转换的代码放置在转换高速缓存中,从翻译的高速缓存中执行转换的代码,并且在转换的代码完成执行之后转换回原始软件代码。

    Selective register reset
    48.
    发明授权
    Selective register reset 有权
    选择性寄存器复位

    公开(公告)号:US08499134B2

    公开(公告)日:2013-07-30

    申请号:US13421289

    申请日:2012-03-15

    申请人: June Lee

    发明人: June Lee

    IPC分类号: G06F13/28

    摘要: The present disclosure includes methods, devices, modules, and systems for storing selective register reset. One method embodiment includes receiving an indication of a die and a plane associated with at least one address cycle. Such a method can also include selectively resetting a particular register of a number of registers, the particular register corresponding to the plane and the die.

    摘要翻译: 本公开包括用于存储选择性寄存器复位的方法,设备,模块和系统。 一种方法实施例包括接收与至少一个地址周期相关联的管芯和平面的指示。 这种方法还可以包括选择性地复位多个寄存器的特定寄存器,该特定寄存器对应于该平面和该管芯。

    Method and apparatus for efficient bi-linear interpolation and motion compensation
    49.
    发明授权
    Method and apparatus for efficient bi-linear interpolation and motion compensation 有权
    用于高效双线性插值和运动补偿的方法和装置

    公开(公告)号:US08463837B2

    公开(公告)日:2013-06-11

    申请号:US10687953

    申请日:2003-10-17

    IPC分类号: G06F17/10

    摘要: A method and apparatus for performing bi-linear interpolation and motion compensation including multiply-add operations and byte shuffle operations on packed data in a processor. In one embodiment, two or more lines of 2n+1 content byte elements may be shuffled to generate a first and second packed data respectively including at least a first and a second 4n byte elements including 2n−1 duplicated elements. A third packed data including sums of products is generated from the first packed data and packed byte coefficients by a multiply-add instruction. A fourth packed data including sums of products is generated from the second packed data and elements and packed byte coefficients by another multiply-add instruction. Corresponding sums of products of the third and fourth packed data are then summed, and may be rounded and averaged.

    摘要翻译: 一种用于执行双线性插值和运动补偿的方法和装置,包括对处理器中的打包数据的乘法运算和字节洗牌操作。 在一个实施例中,2n + 1个内容字节元素的两行或更多行可以进行混洗以产生分别包括至少包括2n-1个重复元素的第一和第二4n个字节元素的第一和第二打包数据。 通过乘法加法指令从第一打包数据和打包字节系数生成包括产品总和的第三打包数据。 通过另一个乘法加法指令,从第二打包数据和元素和打包字节系数生成包括产品总和的第四打包数据。 然后将第三和第四打包数据的相应的乘积相加,并且可以舍入和平均。