Complex Matrix Multiplication Operations with Data Pre-Conditioning in a High Performance Computing Architecture
    31.
    发明申请
    Complex Matrix Multiplication Operations with Data Pre-Conditioning in a High Performance Computing Architecture 失效
    在高性能计算架构中使用数据预处理的复杂矩阵乘法运算

    公开(公告)号:US20110040822A1

    公开(公告)日:2011-02-17

    申请号:US12542324

    申请日:2009-08-17

    IPC分类号: G06F17/16 G06F7/52

    摘要: Mechanisms for performing a complex matrix multiplication operation are provided. A vector load operation is performed to load a first vector operand of the complex matrix multiplication operation to a first target vector register. The first vector operand comprises a real and imaginary part of a first complex vector value. A complex load and splat operation is performed to load a second complex vector value of a second vector operand and replicate the second complex vector value within a second target vector register. The second complex vector value has a real and imaginary part. A cross multiply add operation is performed on elements of the first target vector register and elements of the second target vector register to generate a partial product of the complex matrix multiplication operation. The partial product is accumulated with other partial products and a resulting accumulated partial product is stored in a result vector register.

    摘要翻译: 提供了执行复矩阵乘法运算的机制。 执行矢量加载操作以将复矩阵乘法运算的第一向量操作数加载到第一目标向量寄存器。 第一矢量操作数包括第一复矢量值的实部和虚部。 执行复杂的加载和拼接操作以加载第二向量操作数的第二复数向量值,并在第二目标向量寄存器内复制第二复数向量值。 第二个复矢量值具有实部和虚部。 对第一目标向量寄存器的元素和第二目标向量寄存器的元素执行交叉乘法运算,以生成复矩阵乘法运算的部分乘积。 部分产品与其他部分产品一起累积,并将结果积累的部分产品存储在结果向量寄存器中。

    Method and structure for low latency load-tagged pointer instruction for computer microarchitechture
    32.
    发明授权
    Method and structure for low latency load-tagged pointer instruction for computer microarchitechture 有权
    用于计算机微型计算机的低延迟负载标记指针指令的方法和结构

    公开(公告)号:US07849293B2

    公开(公告)日:2010-12-07

    申请号:US12023791

    申请日:2008-01-31

    IPC分类号: G06F9/44 G06F9/312

    摘要: A methodology and implementation of a load-tagged pointer instruction for RISC based microarchitecture is presented. A first lower latency, speculative implementation reduces overall throughput latency for a microprocessor system by estimating the results of a particular instruction and confirming the integrity of the estimate a little slower than the normal instruction execution latency. A second higher latency, non-speculative implementation that always produces correct results is invoked by the first when the first guesses incorrectly. The methodologies and structures disclosed herein are intended to be combined with predictive techniques for instruction processing to ultimately improve processing throughput.

    摘要翻译: 介绍了基于RISC的微体系结构的负载标记指针指令的方法和实现。 第一个较低的等待时间,推测性实现通过估计特定指令的结果并确认估计的完整性比正常指令执行等待时间慢一点来减少微处理器系统的总吞吐量等待时间。 当第一次猜测不正确时,第一个更高延迟,非推测性实现始终产生正确的结果。 本文公开的方法和结构旨在与指令处理的预测技术组合以最终提高处理吞吐量。

    Multi-Addressable Register File
    34.
    发明申请
    Multi-Addressable Register File 失效
    多地址寄存器文件

    公开(公告)号:US20090198966A1

    公开(公告)日:2009-08-06

    申请号:US12023720

    申请日:2008-01-31

    IPC分类号: G06F9/30

    摘要: A single register file may be addressed using both scalar and SIMD instructions. That is, subsets of registers within a multi-addressable register file according to the illustrative embodiments, are addressable with different instruction forms, e.g., scalar instructions, SIMD instructions, etc., while the entire set of registers may be addressed with yet another form of instructions, referred to herein as Vector-Scalar Extension (VSX) instructions. The operation set that may be performed on the entire set of registers using the VSX instruction form is substantially similar to that of the operation sets of the subsets of registers. Such an arrangement allows legacy instructions to access subsets of registers within the multi-addressable register file while new instructions, i.e. the VSX instructions, may access the entire range of registers within the multi-addressable register file.

    摘要翻译: 可以使用标量和SIMD指令来寻址单个寄存器文件。 也就是说,根据说明性实施例的多可寻址寄存器堆中的寄存器子集可以用不同的指令形式(例如标量指令,SIMD指令等)寻址,而整个寄存器组可以用另一形式 的指令,这里称为矢量 - 标量延伸(VSX)指令。 可以使用VSX指令形式在整个寄存器组上执行的操作集基本上类似于寄存器子集的操作集。 这种布置允许传统指令访问多址寻址寄存器文件内的寄存器子集,而新的指令即VSX指令可以访问多址寻址寄存器堆中的整个寄存器范围。

    DESIGN STRUCTURE FOR PREDICTIVE DECODING
    35.
    发明申请
    DESIGN STRUCTURE FOR PREDICTIVE DECODING 有权
    用于预测性解码的设计结构

    公开(公告)号:US20090119494A1

    公开(公告)日:2009-05-07

    申请号:US11933774

    申请日:2007-11-01

    IPC分类号: G06F9/38

    摘要: A design structure embodied in a machine readable medium used in a design process includes an apparatus for predictive decoding, the apparatus including register logic for fetching an instruction; predictor logic containing predictor information including prior instruction execution characteristics; logic for obtaining predictor information for the fetched instruction from the predictor; and decode logic for generating a selected one of a plurality of decode operation streams corresponding to the fetched instruction, wherein the decode operation stream is selected based on the predictor information.

    摘要翻译: 在设计过程中使用的机器可读介质中体现的设计结构包括用于预测解码的装置,该装置包括用于取指令的寄存器逻辑; 包含预测器信息的预测器逻辑,包括先前的指令执行特性; 用于从预测器获取所获取的指令的预测信息的逻辑; 以及解码逻辑,用于产生对应于获取的指令的多个解码操作流中的所选择的一个解码操作流,其中基于预测器信息来选择解码操作流。

    Copying character data having a termination character from one memory location to another
    36.
    发明授权
    Copying character data having a termination character from one memory location to another 有权
    将具有终止字符的字符数据从一个存储器位置复制到另一个存储器位置

    公开(公告)号:US09454366B2

    公开(公告)日:2016-09-27

    申请号:US13421498

    申请日:2012-03-15

    IPC分类号: G06F12/00 G06F9/30

    摘要: Copying characters of a set of terminated character data from one memory location to another memory location using parallel processing and without causing unwarranted exceptions. The character data to be copied is loaded within one or more vector registers. In particular, in one embodiment, an instruction (e.g., a Vector Load to block Boundary instruction) is used that loads data in parallel in a vector register to a specified boundary, and provides a way to determine the number of characters loaded. To determine the number of characters loaded (a count), another instruction (e.g., a Load Count to Block Boundary instruction) is used. Further, an instruction (e.g., a Vector Find Element Not Equal instruction) is used to find the index of the first delimiter character, i.e., the first termination character, such as a zero or null character within the character data. This instruction checks a plurality of bytes of data in parallel.

    摘要翻译: 使用并行处理将一组终止的字符数据的字符从一个存储器位置复制到另一个存储器位置,并且不引起无理的异常。 要复制的字符数据被加载到一个或多个向量寄存器中。 特别地,在一个实施例中,使用将矢量寄存器中并行的数据加载到指定边界的指令(例如,向量块向量边界指令),并且提供了确定加载的字符数的方法。 为了确定加载的字符数(计数),使用另一条指令(例如,向块边界指令的加载计数)。 此外,使用指令(例如,矢量查找元素不等于指令)来找到第一分隔符字符的索引,即第一终止字符,例如字符数据内的零或空字符。 该指令并行地检查多个字节的数据。

    Run-time instrumentation indirect sampling by address
    37.
    发明授权
    Run-time instrumentation indirect sampling by address 有权
    运行时间仪器间接采样地址

    公开(公告)号:US09405541B2

    公开(公告)日:2016-08-02

    申请号:US13422550

    申请日:2012-03-16

    IPC分类号: G06F11/34 G06F11/36 G06F9/30

    摘要: The invention relates to implementing run-time instrumentation indirect sampling by address. An aspect of the invention includes a method for implementing run-time instrumentation indirect sampling by address. The method includes reading sample-point addresses from a sample-point address array, and comparing, by a processor, the sample-point addresses to an address associated with an instruction from an instruction stream executing on the processor. The method further includes recognizing a sample point upon execution of the instruction associated with the address matching one of the sample-point addresses. Run-time instrumentation information is obtained from the sample point. The method also includes storing the run-time instrumentation information in a run-time instrumentation program buffer as a reporting group.

    摘要翻译: 本发明涉及通过地址实现运行时仪表间接采样。 本发明的一个方面包括一种通过地址实现运行时间仪表间接采样的方法。 该方法包括从采样点地址阵列中读取采样点地址,以及将处理器将采样点地址与来自在处理器上执行的指令流的指令相关联的地址进行比较。 该方法还包括在执行与匹配其中一个采样点地址的地址相关联的指令时识别采样点。 从采样点获取运行时仪表信息。 该方法还包括将运行时仪器信息存储在作为报告组的运行时仪表程序缓冲器中。

    Run-time instrumentation indirect sampling by instruction operation code
    38.
    发明授权
    Run-time instrumentation indirect sampling by instruction operation code 有权
    运行时间仪表通过指令操作代码进行间接采样

    公开(公告)号:US09367316B2

    公开(公告)日:2016-06-14

    申请号:US13422563

    申请日:2012-03-16

    摘要: Embodiments of the invention relate to implementing run-time instrumentation indirect sampling by instruction operation code. An aspect of the invention includes reading sample-point instruction operation codes from a sample-point instruction array, and comparing, by a processor, the sample-point instruction operation codes to an operation code of an instruction from an instruction stream executing on the processor. A sample point is recognized upon execution of the instruction with the operation code matching one of the sample-point instruction operation codes. The run-time instrumentation information is obtained from the sample point. The run-time instrumentation information is stored in a run-time instrumentation program buffer as a reporting group.

    摘要翻译: 本发明的实施例涉及通过指令操作代码实现运行时间仪表间接采样。 本发明的一个方面包括从采样点指令阵列读取采样点指令操作码,并将处理器将采样点指令操作码与来自在处理器上执行的指令流的指令的操作码进行比较 。 在执行指令时,识别采样点,其中操作码与采样点指令操作码之一匹配。 从采样点获取运行时仪表信息。 运行时仪表信息作为报告组存储在运行时仪表程序缓冲区中。

    Performing predecode-time optimized instructions in conjunction with predecode time optimized instruction sequence caching
    39.
    发明授权
    Performing predecode-time optimized instructions in conjunction with predecode time optimized instruction sequence caching 有权
    结合预解码时间优化指令序列缓存执行预解码时间优化指令

    公开(公告)号:US09354888B2

    公开(公告)日:2016-05-31

    申请号:US13432357

    申请日:2012-03-28

    IPC分类号: G06F9/38 G06F9/30

    摘要: A method for performing predecode-time optimized instructions in conjunction with predecode time optimized instruction sequence caching. The method includes receiving a first instruction of an instruction sequence and a second instruction of the instruction sequence and determining if the first instruction and the second instruction can be optimized. In response to the determining that the first instruction and second instruction can be optimized, the method includes, preforming a pre-decode optimization on the instruction sequence and generating a new second instruction, wherein the new second instruction is not dependent on a target operand of the first instruction and storing a pre-decoded first instruction and a pre-decoded new second instruction in an instruction cache. In response to determining that the first instruction and second instruction can not be optimized, the method includes, storing the pre-decoded first instruction and a pre-decoded second instruction in the instruction cache.

    摘要翻译: 一种执行预解码时间优化指令并结合预解码时间优化指令序列缓存的方法。 该方法包括接收指令序列的第一指令和指令序列的第二指令,并且确定是否可以优化第一指令和第二指令。 响应于确定可以优化第一指令和第二指令,该方法包括:对指令序列执行预解码优化并产生新的第二指令,其中新的第二指令不依赖于目标操作数 所述第一指令并将预解码的第一指令和预解码的新的第二指令存储在指令高速缓存中。 响应于确定第一指令和第二指令不能被优化,该方法包括:将预解码的第一指令和预解码的第二指令存储在指令高速缓存中。

    Reducing register read ports for register pairs
    40.
    发明授权
    Reducing register read ports for register pairs 有权
    减少寄存器对的寄存器读端口

    公开(公告)号:US09323529B2

    公开(公告)日:2016-04-26

    申请号:US13552099

    申请日:2012-07-18

    IPC分类号: G06F9/30 G06F15/76

    摘要: Embodiments relate to reducing a number of read ports for register pairs. An aspect includes executing an instruction. The instruction identifies a pair of registers as containing a wide operand which spans the pair of registers. It is determined if a pairing indicator associated with the pair of registers has a first value or a second value. The first value indicates that the wide operand is stored in a wide register, and the second value indicates that the wide operand is not stored in the wide register. Based on the pairing indicator having the first value, the wide operand is read from the wide register. Based on the pairing indicator having the second value, the wide operand is read from the pair of registers. An operation is performed using the wide operand.

    摘要翻译: 实施例涉及减少用于寄存器对的多个读端口。 一方面包括执行指令。 该指令将一对寄存器标识为包含跨越寄存器对的宽操作数。 确定与该对寄存器相关联的配对指示符是否具有第一值或第二值。 第一个值表示宽操作数存储在宽寄存器中,第二个值表示宽操作数不存储在宽寄存器中。 基于具有第一值的配对指示符,从宽寄存器读取宽操作数。 基于具有第二值的配对指示符,从该对寄存器读取宽操作数。 使用宽操作数执行操作。