Matrix Multiplication Operations Using Pair-Wise Load and Splat Operations
    1.
    发明申请
    Matrix Multiplication Operations Using Pair-Wise Load and Splat Operations 有权
    使用配对加载和Splat操作的矩阵乘法运算

    公开(公告)号:US20120011348A1

    公开(公告)日:2012-01-12

    申请号:US12834464

    申请日:2010-07-12

    IPC分类号: G06F9/302

    摘要: Mechanisms for performing a matrix multiplication operation are provided. A vector load operation is performed to load a first vector operand of the matrix multiplication operation to a first target vector register. A pair-wise load and splat operation is performed to load a pair of scalar values of a second vector operand and replicate the pair of scalar values within a second target vector register. An operation is performed on elements of the first target vector register and elements of the second target vector register to generate a partial product of the matrix multiplication operation. The partial product is accumulated with other partial products and a resulting accumulated partial product is stored. This operation may be repeated for a second pair of scalar values of the second vector operand.

    摘要翻译: 提供了执行矩阵乘法运算的机构。 执行向量加载操作以将矩阵乘法运算的第一向量操作数加载到第一目标向量寄存器。 执行成对的加载和拼接操作以加载第二向量操作数的一对标量值,并在第二目标向量寄存器内复制一对标量值。 对第一目标向量寄存器的元素和第二目标向量寄存器的元素执行操作,以生成矩阵乘法运算的部分乘积。 部分产品与其他部分产品一起积累,并存储所得累积的部分产品。 对于第二向量操作数的第二对标量值可以重复该操作。

    Matrix multiplication operations with data pre-conditioning in a high performance computing architecture
    5.
    发明授权
    Matrix multiplication operations with data pre-conditioning in a high performance computing architecture 失效
    在高性能计算架构中使用数据预处理的矩阵乘法运算

    公开(公告)号:US08577950B2

    公开(公告)日:2013-11-05

    申请号:US12542255

    申请日:2009-08-17

    IPC分类号: G06F7/52

    摘要: Mechanisms for performing matrix multiplication operations with data pre-conditioning in a high performance computing architecture are provided. A vector load operation is performed to load a first vector operand of the matrix multiplication operation to a first target vector register. A load and splat operation is performed to load an element of a second vector operand and replicating the element to each of a plurality of elements of a second target vector register. A multiply add operation is performed on elements of the first target vector register and elements of the second target vector register to generate a partial product of the matrix multiplication operation. The partial product of the matrix multiplication operation is accumulated with other partial products of the matrix multiplication operation.

    摘要翻译: 提供了用于在高性能计算架构中执行数据预处理的矩阵乘法运算的机制。 执行向量加载操作以将矩阵乘法运算的第一向量操作数加载到第一目标向量寄存器。 执行加载和拼接操作以加载第二向量操作数的元素并将元素复制到第二目标向量寄存器的多个元素中的每一个。 对第一目标向量寄存器的元素和第二目标向量寄存器的元素执行乘法加法运算,以生成矩阵乘法运算的部分乘积。 矩阵乘法运算的部分乘积与矩阵乘法运算的其他部分积积累。

    Optimized Scalar Promotion with Load and Splat SIMD Instructions
    6.
    发明申请
    Optimized Scalar Promotion with Load and Splat SIMD Instructions 失效
    通过加载和Splat SIMD指令优化标量升级

    公开(公告)号:US20120290816A1

    公开(公告)日:2012-11-15

    申请号:US13555435

    申请日:2012-07-23

    IPC分类号: G06F9/30

    CPC分类号: G06F8/45

    摘要: Mechanisms for optimizing scalar code executed on a single instruction multiple data (SIMD) engine are provided. Placement of vector operation-splat operations may be determined based on an identification of scalar and SIMD operations in an original code representation. The original code representation may be modified to insert the vector operation-splat operations based on the determined placement of vector operation-splat operations to generate a first modified code representation. Placement of separate splat operations may be determined based on identification of scalar and SIMD operations in the first modified code representation. The first modified code representation may be modified to insert or delete separate splat operations based on the determined placement of the separate splat operations to generate a second modified code representation. SIMD code may be output based on the second modified code representation for execution by the SIMD engine.

    摘要翻译: 提供了在单指令多数据(SIMD)引擎上执行的优化标量代码的机制。 可以基于原始代码表示中的标量和SIMD操作的标识来确定矢量操作 - 拼接操作的放置。 可以修改原始代码表示以基于所确定的向量操作 - 分组操作的放置来插入向量操作 - 拼接操作以生成第一修改代码表示。 可以基于第一修改代码表示中的标量和SIMD操作的标识来确定单独的拼接操作的放置。 可以修改第一修改代码表示以基于确定的单独splat操作的布局来插入或删除单独的splat操作以生成第二修改代码表示。 可以基于SIMD引擎执行的第二修改代码表示来输出SIMD代码。

    Matrix Multiplication Operations with Data Pre-Conditioning in a High Performance Computing Architecture
    7.
    发明申请
    Matrix Multiplication Operations with Data Pre-Conditioning in a High Performance Computing Architecture 失效
    在高性能计算架构中使用数据预处理的矩阵乘法运算

    公开(公告)号:US20110040821A1

    公开(公告)日:2011-02-17

    申请号:US12542255

    申请日:2009-08-17

    IPC分类号: G06F17/16 G06F7/52

    摘要: Mechanisms for performing matrix multiplication operations with data pre-conditioning in a high performance computing architecture are provided. A vector load operation is performed to load a first vector operand of the matrix multiplication operation to a first target vector register. A load and splat operation is performed to load an element of a second vector operand and replicating the element to each of a plurality of elements of a second target vector register. A multiply add operation is performed on elements of the first target vector register and elements of the second target vector register to generate a partial product of the matrix multiplication operation. The partial product of the matrix multiplication operation is accumulated with other partial products of the matrix multiplication operation.

    摘要翻译: 提供了用于在高性能计算架构中执行数据预处理的矩阵乘法运算的机制。 执行向量加载操作以将矩阵乘法运算的第一向量操作数加载到第一目标向量寄存器。 执行加载和拼接操作以加载第二向量操作数的元素并将元素复制到第二目标向量寄存器的多个元素中的每一个。 对第一目标向量寄存器的元素和第二目标向量寄存器的元素执行乘法加法运算,以生成矩阵乘法运算的部分乘积。 矩阵乘法运算的部分乘积与矩阵乘法运算的其他部分积积累。

    Complex Matrix Multiplication Operations with Data Pre-Conditioning in a High Performance Computing Architecture
    8.
    发明申请
    Complex Matrix Multiplication Operations with Data Pre-Conditioning in a High Performance Computing Architecture 失效
    在高性能计算架构中使用数据预处理的复杂矩阵乘法运算

    公开(公告)号:US20110040822A1

    公开(公告)日:2011-02-17

    申请号:US12542324

    申请日:2009-08-17

    IPC分类号: G06F17/16 G06F7/52

    摘要: Mechanisms for performing a complex matrix multiplication operation are provided. A vector load operation is performed to load a first vector operand of the complex matrix multiplication operation to a first target vector register. The first vector operand comprises a real and imaginary part of a first complex vector value. A complex load and splat operation is performed to load a second complex vector value of a second vector operand and replicate the second complex vector value within a second target vector register. The second complex vector value has a real and imaginary part. A cross multiply add operation is performed on elements of the first target vector register and elements of the second target vector register to generate a partial product of the complex matrix multiplication operation. The partial product is accumulated with other partial products and a resulting accumulated partial product is stored in a result vector register.

    摘要翻译: 提供了执行复矩阵乘法运算的机制。 执行矢量加载操作以将复矩阵乘法运算的第一向量操作数加载到第一目标向量寄存器。 第一矢量操作数包括第一复矢量值的实部和虚部。 执行复杂的加载和拼接操作以加载第二向量操作数的第二复数向量值,并在第二目标向量寄存器内复制第二复数向量值。 第二个复矢量值具有实部和虚部。 对第一目标向量寄存器的元素和第二目标向量寄存器的元素执行交叉乘法运算,以生成复矩阵乘法运算的部分乘积。 部分产品与其他部分产品一起累积,并将结果积累的部分产品存储在结果向量寄存器中。

    Complex matrix multiplication operations with data pre-conditioning in a high performance computing architecture
    9.
    发明授权
    Complex matrix multiplication operations with data pre-conditioning in a high performance computing architecture 失效
    在高性能计算架构中使用数据预处理的复矩阵乘法运算

    公开(公告)号:US08650240B2

    公开(公告)日:2014-02-11

    申请号:US12542324

    申请日:2009-08-17

    IPC分类号: G06F7/52

    摘要: Mechanisms for performing a complex matrix multiplication operation are provided. A vector load operation is performed to load a first vector operand of the complex matrix multiplication operation to a first target vector register. The first vector operand comprises a real and imaginary part of a first complex vector value. A complex load and splat operation is performed to load a second complex vector value of a second vector operand and replicate the second complex vector value within a second target vector register. The second complex vector value has a real and imaginary part. A cross multiply add operation is performed on elements of the first target vector register and elements of the second target vector register to generate a partial product of the complex matrix multiplication operation. The partial product is accumulated with other partial products and a resulting accumulated partial product is stored in a result vector register.

    摘要翻译: 提供了执行复矩阵乘法运算的机制。 执行矢量加载操作以将复矩阵乘法运算的第一向量操作数加载到第一目标向量寄存器。 第一矢量操作数包括第一复矢量值的实部和虚部。 执行复杂的加载和拼接操作以加载第二向量操作数的第二复数向量值,并在第二目标向量寄存器内复制第二复数向量值。 第二个复矢量值具有实部和虚部。 对第一目标向量寄存器的元素和第二目标向量寄存器的元素执行交叉乘法运算,以生成复矩阵乘法运算的部分乘积。 部分产品与其他部分产品一起累积,并将结果积累的部分产品存储在结果向量寄存器中。

    Optimized Scalar Promotion with Load and Splat SIMD Instructions
    10.
    发明申请
    Optimized Scalar Promotion with Load and Splat SIMD Instructions 失效
    通过加载和Splat SIMD指令优化标量升级

    公开(公告)号:US20090307656A1

    公开(公告)日:2009-12-10

    申请号:US12134495

    申请日:2008-06-06

    IPC分类号: G06F9/44

    CPC分类号: G06F8/45

    摘要: Mechanisms for optimizing scalar code executed on a single instruction multiple data (SIMD) engine are provided. Placement of vector operation-splat operations may be determined based on an identification of scalar and SIMD operations in an original code representation. The original code representation may be modified to insert the vector operation-splat operations based on the determined placement of vector operation-splat operations to generate a first modified code representation. Placement of separate splat operations may be determined based on identification of scalar and SIMD operations in the first modified code representation. The first modified code representation may be modified to insert or delete separate splat operations based on the determined placement of the separate splat operations to generate a second modified code representation. SIMD code may be output based on the second modified code representation for execution by the SIMD engine.

    摘要翻译: 提供了在单指令多数据(SIMD)引擎上执行的优化标量代码的机制。 可以基于原始代码表示中的标量和SIMD操作的标识来确定矢量操作 - 拼接操作的放置。 可以修改原始代码表示以基于所确定的向量操作 - 分组操作的放置来插入向量操作 - 拼接操作以生成第一修改代码表示。 可以基于第一修改代码表示中的标量和SIMD操作的标识来确定单独的拼接操作的放置。 可以修改第一修改代码表示以基于确定的单独splat操作的布局来插入或删除单独的splat操作以生成第二修改代码表示。 可以基于SIMD引擎执行的第二修改代码表示来输出SIMD代码。