PROCESSOR HAVING INCREASED PERFORMANCE VIA ELIMINATION OF SERIAL DEPENDENCIES
    1.
    发明申请
    PROCESSOR HAVING INCREASED PERFORMANCE VIA ELIMINATION OF SERIAL DEPENDENCIES 审中-公开
    处理者通过排除序列依赖性具有提高的性能

    公开(公告)号:US20120166769A1

    公开(公告)日:2012-06-28

    申请号:US12979946

    申请日:2010-12-28

    IPC分类号: G06F9/40 G06F9/38

    CPC分类号: G06F9/3838 G06F9/3017

    摘要: Methods and apparatuses are provided for achieving increased performance via elimination of serial dependencies in instructions or instruction sequences. The apparatus comprises an operational unit for determining whether an instruction will cause dependencies during completion in an execution unit. Responsive to that determination the instruction is replaced with an alternative instruction for completion in the execution unit. In this way, the alternative instruction is completed without causing dependencies in the execution unit. The method comprises determining that an instruction will cause dependencies during completion in a processor and replacing the instruction with an alternative instruction for completion in the processor.

    摘要翻译: 提供了通过消除指令或指令序列中的串行依赖性来实现提高性能的方法和装置。 该装置包括用于在执行单元中完成期间确定指令是否将引起依赖性的操作单元。 响应于该确定,指令被替换为在执行单元中完成的替代指令。 以这种方式,完成替代指令而不会在执行单元中引起相关性。 该方法包括确定在处理器完成期间指令将引起相关性,并用替代指令替换指令以在处理器中完成。

    Reliable execution using compare and transfer instruction on an SMT machine
    2.
    发明授权
    Reliable execution using compare and transfer instruction on an SMT machine 有权
    在SMT机器上使用比较和传输指令可靠执行

    公开(公告)号:US08082425B2

    公开(公告)日:2011-12-20

    申请号:US12432146

    申请日:2009-04-29

    IPC分类号: G06F9/46 G06F11/14

    摘要: A system and method for efficient reliable execution on a simultaneous multithreading machine. A processor is placed in a reliable execution mode (REM) to detect possible errors during execution of a software application. Only two threads may be configured to operate in this mode. Floating-point store and integer-transfer unary instructions may be converted to new instructions. Each new instruction has two source operands, each corresponding to a different thread is specified by a same logical register number as a single source operand of the original unary instruction. All other instructions are replicated, wherein the original instruction and its twin are assigned to different threads. Simultaneous multi-threaded (SMT) floating-point logic may only be able to provide lockstep execution when it communicates using the new instruction with instantiated integer independent clusters. The new instruction cannot begin until both source operands are ready, which are subsequently compared to determine any mismatches or errors.

    摘要翻译: 一种用于在同时多线程机上高效可靠执行的系统和方法。 将处理器放置在可靠的执行模式(REM)中,以在软件应用程序的执行期间检测可能的错误。 只有两个线程可以配置为在此模式下运行。 浮点存储和整数转移一元指令可以转换为新指令。 每个新指令有两个源操作数,每个对应一个不同的线程由与原始一元指令的单个源操作数相同的逻辑寄存器号指定。 复制所有其他指令,其中原始指令及其双指针分配给不同的线程。 同步多线程(SMT)浮点逻辑只能在使用具有实例化的整数独立簇的新指令进行通信时提供锁步执行。 在两个源操作数准备就绪之前,新指令才能开始,随后进行比较以确定任何不匹配或错误。

    THREE OPERAND INSTRUCTION EXTENSION FOR X86 ARCHITECTURE
    3.
    发明申请
    THREE OPERAND INSTRUCTION EXTENSION FOR X86 ARCHITECTURE 有权
    X86架构的三个操作指导扩展

    公开(公告)号:US20090031116A1

    公开(公告)日:2009-01-29

    申请号:US11954623

    申请日:2007-12-12

    IPC分类号: G06F9/30

    摘要: A method and apparatus are contemplated for increasing the number of available instructions in an instruction set architecture. The new instructions extend the number of general-purpose registers and include three or more operands. A combination of an escape code field, an opcode field, an operation configuration field and an operation size field determines a unique new instruction operation. A source operand extension field includes bits to be combined with other fields in order to extend the number of source operand values for general-purpose registers.

    摘要翻译: 预期方法和装置用于增加指令集架构中可用指令的数量。 新指令扩展通用寄存器的数量,并包括三个或更多个操作数。 转义码字段,操作码字段,操作配置字段和操作大小字段的组合决定了唯一的新指令操作。 源操作数扩展字段包括要与其他字段组合的位,以便扩展通用寄存器的源操作数值的数量。

    Superscalar register-renaming for a stack-addressed architecture
    4.
    发明授权
    Superscalar register-renaming for a stack-addressed architecture 有权
    堆栈寻址架构的超标量寄存器重命名

    公开(公告)号:US08539397B2

    公开(公告)日:2013-09-17

    申请号:US12482977

    申请日:2009-06-11

    IPC分类号: G06F17/50

    摘要: A system and method for increasing processor throughput by decreasing a loop critical path. In one embodiment, a table comprises multiple stack entries, each comprising an x87 floating-point (FP) stack specifier. The combinatorial logic for operand translation of N FP instructions per clock cycle may require N instantiated copies of a combinatorial logic block. Each instantiated copy may determine a new ordering of the stack entries. Control logic may receive necessary information from the corresponding N FP instructions and determine a corresponding combined computational effect, or stack reordering, on entries within the table based on two or more instructions. Resulting control signals are conveyed to the N instantiated copies. A resulting accumulative delay from an input of the first copy to the output of the Nth copy may be less than or equal to (N−1)*time_delay versus a longer N*time_delay.

    摘要翻译: 一种通过减少循环关键路径来提高处理器吞吐量的系统和方法。 在一个实施例中,表包括多个堆栈条目,每个堆栈条目包括x87浮点(FP)堆栈说明符。 用于每个时钟周期的NFP指令的操作数转换的组合逻辑可能需要组合逻辑块的N个实例化副本。 每个实例化的副本可以确定堆栈条目的新排序。 控制逻辑可以从相应的NFP指令接收必要的信息,并且基于两个或更多个指令来确定在表内的条目上的对应的组合计算效果或堆栈重新排序。 所得到的控制信号被传送到N个实例复制。 从第一副本的输入到第N个副本的输出的结果累积延迟可以小于或等于(N-1)* time_delay与较长的N * time_delay。

    COMBINED BYTE-PERMUTE AND BIT SHIFT UNIT
    5.
    发明申请
    COMBINED BYTE-PERMUTE AND BIT SHIFT UNIT 有权
    组合式字节和转换单元

    公开(公告)号:US20100318771A1

    公开(公告)日:2010-12-16

    申请号:US12482974

    申请日:2009-06-11

    IPC分类号: G06F9/06 G06F9/302 G06F9/315

    摘要: A processor includes a decode unit and a byte permute unit. The byte permute unit receives an instruction from the decode unit. The byte permute unit determines whether the instruction corresponds to a shuffle instruction or a shift instruction. For a shuffle instruction, the byte permute unit uses a byte shuffler to perform a shuffle operation indicated by the instruction. For a shift instruction that indicates a shift magnitude, the byte permute unit uses the byte shuffler to byte-level shift a source operand corresponding to the instruction by an integer number of bytes. The byte permute unit also generates a sequence of output bits by bit-shifting the byte-level shifted source operand by a number of bits such that the sum of the number of bits and the integer number of bytes is equal to the shift magnitude.

    摘要翻译: 处理器包括解码单元和字节置换单元。 字节置换单元从解码单元接收指令。 字节置换单元确定指令是否对应于混洗指令或移位指令。 对于洗牌指令,字节置换单元使用字节洗牌器执行指令所指示的随机操作。 对于指示移位幅度的移位指令,字节置换单元使用字节洗牌器将对应于该指令的源操作数字节级移位整数个字节。 字节置换单元还通过将字节电平移位的源操作数进行比特移位多个位来产生输出比特序列,使得比特数和整数字节的和等于移位量。

    RELIABLE EXECUTION USING COMPARE AND TRANSFER INSTRUCTION ON AN SMT MACHINE
    6.
    发明申请
    RELIABLE EXECUTION USING COMPARE AND TRANSFER INSTRUCTION ON AN SMT MACHINE 有权
    使用SMT机器的比较和传输指令进行可靠的执行

    公开(公告)号:US20100281239A1

    公开(公告)日:2010-11-04

    申请号:US12432146

    申请日:2009-04-29

    IPC分类号: G06F9/30 G06F9/302

    摘要: A system and method for efficient reliable execution on a simultaneous multithreading machine. A processor is placed in a reliable execution mode (REM) to detect possible errors during execution of a mission critical software application. Only two threads may be configured to operate in this mode. Floating-point store and integer-transfer unary instructions may be converted to new binary instructions. Each new instruction has two source operands, each one corresponding to a different thread is specified by a same logical register number as a single source operand of the original unary instruction. All other instructions are replicated, wherein the original instruction and its twin are assigned to different threads. Simultaneous multi-threaded (SMT) floating-point logic may only be able to provide lockstep execution when it communicates using the new instruction with instantiated integer independent clusters. The new instruction cannot begin until both source operands are ready, which are subsequently compared to determine any mismatches or errors.

    摘要翻译: 一种用于在同时多线程机上高效可靠执行的系统和方法。 将处理器置于可靠的执行模式(REM)中,以检测任务关键型软件应用程序执行期间的可能错误。 只有两个线程可以配置为在此模式下运行。 浮点存储和整数传递一元指令可以转换为新的二进制指令。 每个新指令都有两个源操作数,每一个对应一个不同的线程由与原始一元指令的单个源操作数相同的逻辑寄存器号来指定。 复制所有其他指令,其中原始指令及其双指针分配给不同的线程。 同步多线程(SMT)浮点逻辑只能在使用具有实例化的整数独立簇的新指令进行通信时提供锁步执行。 在两个源操作数准备就绪之前,新指令才能开始,随后进行比较以确定任何不匹配或错误。

    Combined byte-permute and bit shift unit
    7.
    发明授权
    Combined byte-permute and bit shift unit 有权
    组合字节置换和位移单元

    公开(公告)号:US08909904B2

    公开(公告)日:2014-12-09

    申请号:US12482974

    申请日:2009-06-11

    摘要: A processor includes a decode unit and a byte permute unit. The byte permute unit receives an instruction from the decode unit. The byte permute unit determines whether the instruction corresponds to a shuffle instruction or a shift instruction. For a shuffle instruction, the byte permute unit uses a byte shuffler to perform a shuffle operation indicated by the instruction. For a shift instruction that indicates a shift magnitude, the byte permute unit uses the byte shuffler to byte-level shift a source operand corresponding to the instruction by an integer number of bytes. The byte permute unit also generates a sequence of output bits by bit-shifting the byte-level shifted source operand by a number of bits such that the sum of the number of bits and the integer number of bytes is equal to the shift magnitude.

    摘要翻译: 处理器包括解码单元和字节置换单元。 字节置换单元从解码单元接收指令。 字节置换单元确定指令是否对应于混洗指令或移位指令。 对于洗牌指令,字节置换单元使用字节洗牌器执行指令所指示的随机操作。 对于指示移位幅度的移位指令,字节置换单元使用字节洗牌器将对应于该指令的源操作数字节级移位整数个字节。 字节置换单元还通过将字节电平移位的源操作数进行比特移位多个位来产生输出比特序列,使得比特数和整数字节的和等于移位量。

    Three operand instruction extension for X86 architecture
    8.
    发明授权
    Three operand instruction extension for X86 architecture 有权
    X86架构的三个操作指令扩展

    公开(公告)号:US07836278B2

    公开(公告)日:2010-11-16

    申请号:US11954623

    申请日:2007-12-12

    IPC分类号: G06F9/30

    摘要: A method and apparatus are contemplated for increasing the number of available instructions in an instruction set architecture. The new instructions extend the number of general-purpose registers and include three or more operands. A combination of an escape code field, an opcode field, an operation configuration field and an operation size field determines a unique new instruction operation. A source operand extension field includes bits to be combined with other fields in order to extend the number of source operand values for general-purpose registers.

    摘要翻译: 预期方法和装置用于增加指令集架构中可用指令的数量。 新指令扩展通用寄存器的数量,并包括三个或更多个操作数。 转义码字段,操作码字段,操作配置字段和操作大小字段的组合决定了唯一的新指令操作。 源操作数扩展字段包括要与其他字段组合的位,以便扩展通用寄存器的源操作数值的数量。

    SUPERSCALAR REGISTER-RENAMING FOR A STACK-ADDRESSED ARCHITECTURE
    9.
    发明申请
    SUPERSCALAR REGISTER-RENAMING FOR A STACK-ADDRESSED ARCHITECTURE 有权
    用于堆叠式建筑的超级注册登记

    公开(公告)号:US20100318772A1

    公开(公告)日:2010-12-16

    申请号:US12482977

    申请日:2009-06-11

    IPC分类号: G06F9/305 G06F9/44

    摘要: A system and method for increasing processor throughput by decreasing a loop critical path. In one embodiment, a table comprises multiple stack entries, each comprising an x87 floating-point (FP) stack specifier. The combinatorial logic for operand translation of N FP instructions per clock cycle may require N instantiated copies of a combinatorial logic block. Each instantiated copy may determine a new ordering of the stack entries. Control logic may receive necessary information from the corresponding N FP instructions and determine a corresponding combined computational effect, or stack reordering, on entries within the table based on two or more instructions. Resulting control signals are conveyed to the N instantiated copies. A resulting accumulative delay from an input of the first copy to the output of the Nth copy may be less than or equal to (N−1)*time_delay versus a longer N*time_delay.

    摘要翻译: 一种通过减少循环关键路径来提高处理器吞吐量的系统和方法。 在一个实施例中,表包括多个堆栈条目,每个堆栈条目包括x87浮点(FP)堆栈说明符。 用于每个时钟周期的NFP指令的操作数转换的组合逻辑可能需要组合逻辑块的N个实例化副本。 每个实例化的副本可以确定堆栈条目的新排序。 控制逻辑可以从相应的NFP指令接收必要的信息,并且基于两个或更多个指令来确定在表内的条目上的对应的组合计算效果或堆栈重新排序。 所得到的控制信号被传送到N个实例复制。 从第一副本的输入到第N个副本的输出的结果累积延迟可以小于或等于(N-1)* time_delay与较长的N * time_delay。