Branch Predictor with Branch Resolution Code Injection

    公开(公告)号:US20180173534A1

    公开(公告)日:2018-06-21

    申请号:US15385011

    申请日:2016-12-20

    Abstract: A processor may include a decoder to decode a first instance of a branch instruction for which the resolved branch direction is data dependent and add results of the decoding to a stream of decoded instructions for execution. The processor may include a code generator to inject, into the stream of decoded instructions, branch resolution code to resolve the branch condition for a second instance of the branch instruction following the first instance at a predetermined look-ahead distance. The processor may include an execution unit to execute the branch resolution code, storing an indication of the resolved branch direction for the second instance in an entry of a prediction queue for the branch instruction. The processor may include a branch predictor to receive the second instance of the branch instruction, and output the resolved branch direction as the predicted branch direction for the second instance of the branch instruction.

    Apparatus and method for speculative conditional move operation

    公开(公告)号:US11188342B2

    公开(公告)日:2021-11-30

    申请号:US16837824

    申请日:2020-04-01

    Abstract: An apparatus and method for a speculative conditional move instruction. A processor comprising: a decoder to decode a first speculative conditional move instruction; a prediction storage to store prediction data related to previously executed speculative conditional move instructions; and execution circuitry to read first prediction data associated with the speculative conditional move instruction and to execute the speculative conditional move instruction either speculatively or non-speculatively based on the first prediction data.

    Apparatus and method for improving power-performance using a software analysis routine

    公开(公告)号:US10209764B2

    公开(公告)日:2019-02-19

    申请号:US15385184

    申请日:2016-12-20

    Abstract: Embodiments described herein relate to improving processor power-performance using a binary analyzer routine. In one example, a processor includes a memory interface to couple to a memory, at least one hardware accelerator circuit, and an execution pipeline including at least fetch, decode, and execute stages, wherein the processor, in response to a hot-spot hardware event indicating presence of a hot-spot sequence, is to switch context to a binary analyzer routine stored in the memory, the binary analyzer routine including instructions that, when fetched, decoded, and executed by the processor, cause the processor to analyze a region in the memory containing the hot-spot sequence, analyze hardware metrics relating to execution of the hot-spot sequence, and generate, based on the analyses, a recommendation for the at least one hardware accelerator circuit to improve at least one of power consumption and performance.

    Instructions for vector operations with constant values

    公开(公告)号:US11579881B2

    公开(公告)日:2023-02-14

    申请号:US15638074

    申请日:2017-06-29

    Abstract: Disclosed embodiments relate to instructions for vector operations with immediate values. In one example, a system includes a memory and a processor that includes fetch circuitry to fetch the instruction from a code storage, the instruction including an opcode, a destination identifier to specify a destination vector register, a first immediate, and a write mask identifier to specify a write mask register, the write mask register including at least one bit corresponding to each destination vector register element, the at least one bit to specify whether the destination vector register element is masked or unmasked, decode circuitry to decode the fetched instruction, and execution circuitry to execute the decoded instruction, to, use the write mask register to determine unmasked elements of the destination vector register, and, when the opcode specifies to broadcast, broadcast the first immediate to one or more unmasked vector elements of the destination vector register.

    INSTRUCTIONS FOR VECTOR OPERATIONS WITH CONSTANT VALUES

    公开(公告)号:US20190004801A1

    公开(公告)日:2019-01-03

    申请号:US15638074

    申请日:2017-06-29

    Abstract: Disclosed embodiments relate to instructions for vector operations with immediate values. In one example, a system includes a memory and a processor that includes fetch circuitry to fetch the instruction from a code storage, the instruction including an opcode, a destination identifier to specify a destination vector register, a first immediate, and a write mask identifier to specify a write mask register, the write mask register including at least one bit corresponding to each destination vector register element, the at least one bit to specify whether the destination vector register element is masked or unmasked, decode circuitry to decode the fetched instruction, and execution circuitry to execute the decoded instruction, to, use the write mask register to determine unmasked elements of the destination vector register, and, when the opcode specifies to broadcast, broadcast the first immediate to one or more unmasked vector elements of the destination vector register.

    APPARATUS AND METHOD FOR IMPROVING POWER-PERFORMANCE USING A SOFTWARE ANALYSIS ROUTINE

    公开(公告)号:US20180173291A1

    公开(公告)日:2018-06-21

    申请号:US15385184

    申请日:2016-12-20

    CPC classification number: G06F9/3867 G06F1/3206 G06F9/30065

    Abstract: Embodiments described herein relate to improving processor power-performance using a binary analyzer routine. In one example, a processor includes a memory interface to couple to a memory, at least one hardware accelerator circuit, and an execution pipeline including at least fetch, decode, and execute stages, wherein the processor, in response to a hot-spot hardware event indicating presence of a hot-spot sequence, is to switch context to a binary analyzer routine stored in the memory, the binary analyzer routine including instructions that, when fetched, decoded, and executed by the processor, cause the processor to analyze a region in the memory containing the hot-spot sequence, analyze hardware metrics relating to execution of the hot-spot sequence, and generate, based on the analyses, a recommendation for the at least one hardware accelerator circuit to improve at least one of power consumption and performance.

    Method and apparatus for performance efficient ISA virtualization using dynamic partial binary translation
    8.
    发明授权
    Method and apparatus for performance efficient ISA virtualization using dynamic partial binary translation 有权
    使用动态部分二进制翻译实现高性能ISA虚拟化的方法和装置

    公开(公告)号:US09552207B2

    公开(公告)日:2017-01-24

    申请号:US15013993

    申请日:2016-02-02

    Abstract: Methods, apparatus and systems for virtualization of a native instruction set are disclosed. Embodiments include a processor core executing the native instructions and a second core, or alternatively only the second processor core consuming less power while executing a second instruction set that excludes portions of the native instruction set. The second core's decoder detects invalid opcodes of the second instruction set. A microcode layer disassembler determines if opcodes should be translated. A translation runtime environment identifies an executable region containing an invalid opcode, other invalid opcodes and interjacent valid opcodes of the second instruction set. An analysis unit determines an initial machine state prior to execution of the invalid opcode. A partial translation of the executable region that includes encapsulations of the translations of invalid opcodes and state recoveries of the machine states is generated and saved to a translation cache memory.

    Abstract translation: 公开了用于本地指令集的虚拟化的方法,装置和系统。 实施例包括执行本地指令的处理器核心和第二核心,或者替代地,只有第二处理器核心在执行排除本地指令集的部分的第二指令集时消耗较少的功率。 第二核心解码器检测第二指令集的无效操作码。 微码层拆解器确定是否应翻译操作码。 翻译运行时环境识别包含第二指令集的无效操作码,其他无效操作码和中间有效操作码的可执行区域。 分析单元在执行无效操作码之前确定初始机器状态。 生成可执行区域的部分翻译,其中包括无效操作码的翻译和机器状态的状态恢复的封装,并将其保存到翻译高速缓冲存储器。

    METHOD AND APPARATUS FOR PERFORMANCE EFFICIENT ISA VIRTUALIZATION USING DYNAMIC PARTIAL BINARY TRANSLATION
    10.
    发明申请
    METHOD AND APPARATUS FOR PERFORMANCE EFFICIENT ISA VIRTUALIZATION USING DYNAMIC PARTIAL BINARY TRANSLATION 有权
    使用动态部分二进制翻译执行有效的ISA虚拟化的方法和装置

    公开(公告)号:US20150370567A1

    公开(公告)日:2015-12-24

    申请号:US14840014

    申请日:2015-08-30

    Abstract: Methods, apparatus and systems for virtualization of a native instruction set are disclosed. Embodiments include a processor core executing the native instructions and a second core, or alternatively only the second processor core consuming less power while executing a second instruction set that excludes portions of the native instruction set. The second core's decoder detects invalid opcodes of the second instruction set. A microcode layer disassembler determines if opcodes should be translated. A translation runtime environment identifies an executable region containing an invalid opcode, other invalid opcodes and interjacent valid opcodes of the second instruction set. An analysis unit determines an initial machine state prior to execution of the invalid opcode. A partial translation of the executable region that includes encapsulations of the translations of invalid opcodes and state recoveries of the machine states is generated and saved to a translation cache memory.

    Abstract translation: 公开了用于本地指令集的虚拟化的方法,装置和系统。 实施例包括执行本地指令的处理器核心和第二核心,或者替代地,只有第二处理器核心在执行排除本地指令集的部分的第二指令集时消耗较少的功率。 第二核心解码器检测第二指令集的无效操作码。 微码层拆解器确定是否应翻译操作码。 翻译运行时环境识别包含第二指令集的无效操作码,其他无效操作码和中间有效操作码的可执行区域。 分析单元在执行无效操作码之前确定初始机器状态。 生成可执行区域的部分翻译,其中包括无效操作码的翻译和机器状态的状态恢复的封装,并将其保存到翻译高速缓冲存储器。

Patent Agency Ranking