Apparatus and method for efficient call/return emulation using a dual return stack buffer

    公开(公告)号:US10545735B2

    公开(公告)日:2020-01-28

    申请号:US15813021

    申请日:2017-11-14

    Abstract: An apparatus and method for a dual return stack buffer (RSB) for use in binary translation systems. For example, one embodiment of a processor comprises: a dual return stack buffer (DRSB) comprising a native RSB and an extended RSB (XRSB), the dual RSB to be used within a binary translation execution environment in which guest call-return instruction sequences are translated to native call-return instruction sequences to be executed directly by the processor; the native RSB to store native return addresses associated with the native call-return instruction sequences; and the XRSB to store emulated return addresses associated with the guest call-return instruction sequences, wherein each native return address stored in the RSB is associated with an emulated return address stored in the XRSB.

    Method and apparatus for implementing a dynamic out-of-order processor pipeline

    公开(公告)号:US09612840B2

    公开(公告)日:2017-04-04

    申请号:US14228690

    申请日:2014-03-28

    Abstract: A hardware/software co-design for an optimized dynamic out-of-order Very Long Instruction Word (VLIW) pipeline. For example, one embodiment of an apparatus comprises: an instruction fetch unit to fetch Very Long Instruction Words (VLIWs) in their program order from memory, each of the VLIWs comprising a plurality of reduced instruction set computing (RISC) instruction syllables grouped into the VLIWs in an order which removes data-flow dependencies and false output dependencies between the syllables; a decode unit to decode the VLIWs in their program order and output the syllables of each decoded VLIW in parallel; and an out-of-order execution engine to execute the syllables preferably in parallel with other syllables, wherein at least some of the syllables are to be executed in a different order than the order in which they are received from the decode unit, the out-of-order execution engine having one or more processing stages which do not check for data-flow dependencies and false output dependencies between the syllables when performing operations.

    INSTRUCTION AND LOGIC FOR BULK REGISTER RECLAMATION
    15.
    发明申请
    INSTRUCTION AND LOGIC FOR BULK REGISTER RECLAMATION 审中-公开
    大容量存储器重新引导的指令和逻辑

    公开(公告)号:US20160092222A1

    公开(公告)日:2016-03-31

    申请号:US14496113

    申请日:2014-09-25

    CPC classification number: G06F9/30185 G06F9/384 G06F9/3857

    Abstract: A processor includes a front end, a decoder, an allocator, and a retirement unit. The decoder includes logic to identify an end-of-live-range (EOLR) indicator. The EOLR indicator specifies an architectural register and a location in code for which the architectural register is unused. The allocator includes logic to scan for a mapping of the architectural register to a physical register, based upon the EOLR indicator. The allocator also includes logic to generate a request to disassociate the architectural register from the physical register. The retirement unit includes logic to disassociate the architectural register from the physical register.

    Abstract translation: 处理器包括前端,解码器,分配器和退休单元。 解码器包括用于识别终点范围(EOLR)指示符的逻辑。 EOLR指示符指定体系结构寄存器和不使用体系结构寄存器的代码中的位置。 分配器包括基于EOLR指示器扫描架构寄存器到物理寄存器的映射的逻辑。 分配器还包括生成用于将体系结构寄存器与物理寄存器取消关联的请求的逻辑。 退休单位包括将架构寄存器与物理寄存器取消关联的逻辑。

    Instruction length decoding
    16.
    发明授权

    公开(公告)号:US12248785B2

    公开(公告)日:2025-03-11

    申请号:US17062556

    申请日:2020-10-03

    Abstract: A processor includes a binary translator an a decoder. The binary translator includes logic to analyze a stream of atomic instructions, identify words by boundary bits in the atomic instructions, generate a mask to identify the words, and load the mask and the plurality of words into an instruction cache line. The words include atomic instructions. At least one word includes more than one atomic instruction. The decoder includes logic to apply the mask to identify a first word from the instruction cache line and decode the first word based upon the applied mask.

    Apparatus and method for architectural performance monitoring in binary translation systems

    公开(公告)号:US10387159B2

    公开(公告)日:2019-08-20

    申请号:US14614264

    申请日:2015-02-04

    Abstract: Methods and apparatuses relate to emulating architectural performance monitoring in a binary translation system. In one embodiment, a processor includes an architectural performance counter to maintain an architectural value associated with instruction execution, a register to store the architectural value of the architectural performance counter, binary translation logic to embed an architectural value from the architectural performance counter into a stream of translated instructions having a transactional code region and to store the architectural value into the register, and an execution unit to execute the transactional code region of the stream of translated instructions. The binary translation logic is configured to add the architectural value from the register to the architectural performance counter upon completion of the transactional code region of the stream of translated instructions. In one embodiment, a binary translation system overcomes software incompatibilities by using microarchitectural support to transparently and accurately emulate architectural performance counter behavior.

    Method and apparatus for implementing a dynamic out-of-order processor pipeline

    公开(公告)号:US10338927B2

    公开(公告)日:2019-07-02

    申请号:US15477374

    申请日:2017-04-03

    Abstract: A hardware/software co-design for an optimized dynamic out-of-order Very Long Instruction Word (VLIW) pipeline. For example, one embodiment of an apparatus comprises: an instruction fetch unit to fetch Very Long Instruction Words (VLIWs) in their program order from memory, each of the VLIWs comprising a plurality of reduced instruction set computing (RISC) instruction syllables grouped into the VLIWs in an order which removes data-flow dependencies and false output dependencies between the syllables; a decode unit to decode the VLIWs in their program order and output the syllables of each decoded VLIW in parallel; and an out-of-order execution engine to execute the syllables preferably in parallel with other syllables, wherein at least some of the syllables are to be executed in a different order than the order in which they are received from the decode unit, the out-of-order execution engine having one or more processing stages which do not check for data-flow dependencies and false output dependencies between the syllables when performing operations.

Patent Agency Ranking