Fast unaligned memory access
    1.
    发明授权

    公开(公告)号:US10360031B2

    公开(公告)日:2019-07-23

    申请号:US14376825

    申请日:2011-10-21

    Abstract: Fast unaligned memory access. In accordance with a first embodiment of the present invention, a computing device includes a load queue memory structure configured to queue load operations and a store queue memory structure configured to queue store operations. The computing device includes also includes at least one bit configured to indicate the presence of an unaligned address component for an entry of said load queue memory structure, and at least one bit configured to indicate the presence of an unaligned address component for an entry of said store queue memory structure. The load queue memory may also include memory configured to indicate data forwarding of an unaligned address component from said store queue memory structure to said load queue memory structure.

    Instruction sequence buffer to enhance branch prediction efficiency

    公开(公告)号:US09678755B2

    公开(公告)日:2017-06-13

    申请号:US13879365

    申请日:2011-10-12

    CPC classification number: G06F9/3861 G06F9/30058 G06F9/3808 G06F9/3844

    Abstract: A method for outputting alternative instruction sequences. The method includes tracking repetitive hits to determine a set of frequently hit instruction sequences for a microprocessor. A frequently miss-predicted branch instruction is identified, wherein the predicted outcome of the branch instruction is frequently wrong. An alternative instruction sequence for the branch instruction target is stored into a buffer. On a subsequent hit to the branch instruction where the predicted outcome of the branch instruction was wrong, the alternative instruction sequence is output from the buffer.

    Guest instruction block with near branching and far branching sequence construction to native instruction block
    5.
    发明授权
    Guest instruction block with near branching and far branching sequence construction to native instruction block 有权
    访客指令块,具有近分支和远分支序列构造到本地指令块

    公开(公告)号:US09542187B2

    公开(公告)日:2017-01-10

    申请号:US13359817

    申请日:2012-01-27

    Abstract: A method for translating instructions for a processor. The method includes accessing a plurality of guest instructions that comprise multiple guest branch instructions comprising at least one guest far branch, and building an instruction sequence from the plurality of guest instructions by using branch prediction on the at least one guest far branch. The method further includes assembling a guest instruction block from the instruction sequence. The guest instruction block is translated to a corresponding native conversion block, wherein an at least one native far branch that corresponds to the at least one guest far branch and wherein the at least one native far branch includes an opposite guest address for an opposing branch path of the at least one guest far branch. Upon encountering a missprediction, a correct instruction sequence is obtained by accessing the opposite guest address.

    Abstract translation: 一种用于翻译处理器的指令的方法。 该方法包括:访问包括至少一个来宾远分支的多个客运分支指令的多个访客指令,以及通过在至少一个来宾远分支上使用分支预测来从多个访客指令构建指令序列。 所述方法还包括从所述指令序列组装来宾指令块。 访客指令块被转换为相应的本机转换块,其中对应于至少一个来宾远分支的至少一个本地远分支,并且其中所述至少一个本地远分支包括用于相对分支路径的相对的访客地址 的至少一个客户远分支。 在遇到错误预测时,通过访问相对的访客地址获得正确的指令序列。

    Systems and methods for load canceling in a processor that is connected to an external interconnect fabric
    7.
    发明授权
    Systems and methods for load canceling in a processor that is connected to an external interconnect fabric 有权
    连接到外部互连结构的处理器中负载消除的系统和方法

    公开(公告)号:US09424046B2

    公开(公告)日:2016-08-23

    申请号:US13649505

    申请日:2012-10-11

    Abstract: Systems and methods for load canceling in a processor that is connected to an external interconnect fabric are disclosed. As a part of a method for load canceling in a processor that is connected to an external bus, and responsive to a flush request and a corresponding cancellation of pending speculative loads from a load queue, a type of one or more of the pending speculative loads that are positioned in the instruction pipeline external to the processor, is converted from load to prefetch. Data corresponding to one or more of the pending speculative loads that are positioned in the instruction pipeline external to the processor is accessed and returned to cache as prefetch data. The prefetch data is retired in a cache location of the processor.

    Abstract translation: 公开了在连接到外部互连结构的处理器中用于负载消除的系统和方法。 作为在连接到外部总线的处理器中的负载消除的方法的一部分,并且响应于刷新请求以及来自加载队列的等待的推测负载的相应取消,一种或多种待处理的投机负载 位于处理器外部的指令流水线中,从负载转换为预取。 对应于位于处理器外部的指令流水线中的一个或多个未决投机负载的数据被访问并作为预取数据返回到高速缓存。 预取数据在处理器的高速缓存位置中退出。

    INTERCONNECT STRUCTURE TO SUPPORT THE EXECUTION OF INSTRUCTION SEQUENCES BY A PLURALITY OF ENGINES
    9.
    发明申请
    INTERCONNECT STRUCTURE TO SUPPORT THE EXECUTION OF INSTRUCTION SEQUENCES BY A PLURALITY OF ENGINES 有权
    互连结构支持多个引擎执行指令序列

    公开(公告)号:US20120297396A1

    公开(公告)日:2012-11-22

    申请号:US13475739

    申请日:2012-05-18

    Abstract: A global interconnect system. The global interconnect system includes a plurality of resources having data for supporting the execution of multiple code sequences and a plurality of engines for implementing the execution of the multiple code sequences. A plurality of resource consumers are within each of the plurality of engines. A global interconnect structure is coupled to the plurality of resource consumers and coupled to the plurality of resources to enable data access and execution of the multiple code sequences, wherein the resource consumers access the resources through a per cycle utilization of the global interconnect structure.

    Abstract translation: 全球互连系统。 全局互连系统包括具有用于支持多个代码序列的执行的数据的多个资源和用于实现多个代码序列的执行的多个引擎。 多个资源消耗者在多个发动机的每一个内。 全局互连结构耦合到多个资源使用者,并耦合到多个资源,以实现多个代码序列的数据访问和执行,其中资源使用者通过全局互连结构的每周期利用来访问资源。

    DECENTRALIZED ALLOCATION OF RESOURCES AND INTERCONNNECT STRUCTURES TO SUPPORT THE EXECUTION OF INSTRUCTION SEQUENCES BY A PLURALITY OF ENGINES
    10.
    发明申请
    DECENTRALIZED ALLOCATION OF RESOURCES AND INTERCONNNECT STRUCTURES TO SUPPORT THE EXECUTION OF INSTRUCTION SEQUENCES BY A PLURALITY OF ENGINES 有权
    资源和互连结构的分散化分配,以支持大量发动机执行指令序列

    公开(公告)号:US20120297170A1

    公开(公告)日:2012-11-22

    申请号:US13475708

    申请日:2012-05-18

    Abstract: A method for decentralized resource allocation in an integrated circuit. The method includes receiving a plurality of requests from a plurality of resource consumers of a plurality of partitionable engines to access a plurality resources, wherein the resources are spread across the plurality of engines and are accessed via a global interconnect structure. At each resource, a number of requests for access to said each resource are added. At said each resource, the number of requests are compared against a threshold limiter. At said each resource, a subsequent request that is received that exceeds the threshold limiter is canceled. Subsequently, requests that are not canceled within a current clock cycle are implemented.

    Abstract translation: 一种集成电路中分散资源分配的方法。 该方法包括从多个可分割引擎的多个资源使用者接收多个请求以访问多个资源,其中资源分布在多个引擎上并经由全局互连结构访问。 在每个资源处,添加对所述每个资源的访问的多个请求。 在所述每个资源中,将请求数与阈值限制器进行比较。 在所述每个资源处,接收到的超过阈值限制器的后续请求被取消。 随后,实现在当前时钟周期内未被取消的请求。

Patent Agency Ranking