USING A CONVERSION LOOK ASIDE BUFFER TO IMPLEMENT AN INSTRUCTION SET AGNOSTIC RUNTIME ARCHITECTURE
    1.
    发明申请
    USING A CONVERSION LOOK ASIDE BUFFER TO IMPLEMENT AN INSTRUCTION SET AGNOSTIC RUNTIME ARCHITECTURE 审中-公开
    使用转换看起来缓冲区来实现指令集AGNOSTIC RUNTIME架构

    公开(公告)号:WO2016014867A9

    公开(公告)日:2016-09-09

    申请号:PCT/US2015041851

    申请日:2015-07-23

    Abstract: A system for an agnostic runtime architecture. The system includes a system emulation/virtualization converter, an application code converter, and a converter, wherein a system emulation/virtualization converter and an application code converter implement a system emulation process. The system converter implements a system and application conversion process for executing code from a guest image, wherein the system converter or the system emulator accesses a plurality of guest instructions that comprise multiple guest branch instructions, and assembles the plurality of guest instructions into a guest instruction block. The system converter also translates the guest instruction block into a corresponding native conversion block, stores the native conversion block into a native cache, and stores a mapping of the guest instruction block to corresponding native conversion block in a conversion look aside buffer. Upon a subsequent request for a guest instruction, the conversion look aside buffer is indexed to determine whether a hit occurred, wherein the mapping indicates the guest instruction has a corresponding converted native instruction in the native cache, and forwards the converted native instruction for execution in response to the hit.

    Abstract translation: 用于不可知的运行时体系结构的系统。 该系统包括系统仿真/虚拟化转换器,应用代码转换器和转换器,其中系统仿真/虚拟化转换器和应用代码转换器实现系统仿真过程。 系统转换器实现用于执行来宾图像的代码的系统和应用程序转换过程,其中系统转换器或系统仿真器访问包括多个客户转移指令的多个客户指令,并将多个客户指令组装成客人指令 块。 系统转换器还将访客指令块翻译成相应的本机转换块,将本机转换块存储到本地高速缓存中,并将客机指令块的映射存储到转换后备缓冲器中的相应本机转换块。 在随后的访客指令请求时,转换后备缓冲器被索引以确定是否发生了命中,其中映射指示访客指令在本地缓存中具有对应的经转换的本机指令,并且将转换后的本机指令转发用于在 应对这次袭击。

    METHOD AND APPARATUS FOR EFFICIENT SCHEDULING FOR ASYMMETRICAL EXECUTION UNITS
    2.
    发明申请
    METHOD AND APPARATUS FOR EFFICIENT SCHEDULING FOR ASYMMETRICAL EXECUTION UNITS 审中-公开
    用于非对称执行单元的有效调度的方法和装置

    公开(公告)号:WO2014152359A1

    公开(公告)日:2014-09-25

    申请号:PCT/US2014/027252

    申请日:2014-03-14

    Inventor: CHAN, Nelson, N.

    Abstract: A method for performing instruction scheduling in an out-of-order microprocessor pipeline is disclosed. The method comprises selecting a first set of instructions to dispatch from a scheduler to an execution module, wherein the execution module comprises two types of execution units. The first type of execution unit executes both a first and a second type of instruction and the second type of execution unit executes only the second type. Next, the method comprises selecting a second set of instructions to dispatch, which is a subset of the first set and comprises only instructions of the second type. Next, the method comprises determining a third set of instructions, which comprises instructions not selected as part of the second set. Finally, the method comprises dispatching the second set for execution using the second type of execution unit and dispatching the third set for execution using the first type of execution unit.

    Abstract translation: 公开了一种在无序微处理器流水线中执行指令调度的方法。 该方法包括选择第一组指令以从调度器发送到执行模块,其中执行模块包括两种类型的执行单元。 第一类型的执行单元执行第一类型和第二种类型的指令,并且第二类执行单元仅执行第二类型。 接下来,该方法包括选择第二组指令以分派,其是第一组的子集,并且仅包括第二类型的指令。 接下来,该方法包括确定第三组指令,其包括未被选择为第二组的一部分的指令。 最后,该方法包括使用第二类型的执行单元调度用于执行的第二组,并使用第一类型的执行单元调度第三组进行执行。

    METHOD AND APPARATUS FOR SORTING ELEMENTS IN HARDWARE STRUCTURES
    3.
    发明申请
    METHOD AND APPARATUS FOR SORTING ELEMENTS IN HARDWARE STRUCTURES 审中-公开
    用于在硬件结构中分配元素的方法和装置

    公开(公告)号:WO2014151722A1

    公开(公告)日:2014-09-25

    申请号:PCT/US2014/026312

    申请日:2014-03-13

    CPC classification number: G06F9/3855 G06F9/3834 G06F9/3857

    Abstract: A method for sorting elements in hardware structures is disclosed. The method comprises selecting a plurality of elements to order from an unordered input queue (UIQ) within a predetermined range in response to finding a match between at least one most significant bit of the predetermined range and corresponding bits of a respective identifier associated with each of the plurality of elements. The method further comprises presenting each of the plurality of elements to a respective multiplexer. Further the method comprises generating a select signal for an enabled multiplexer in response to finding a match between at least one least significant bit of a respective identifier associated with each of the plurality of elements and a port number of the ordered queue. Finally, the method comprises forwarding a packet associated with a selected element identifier to a matching port number of the ordered queue from the enabled multiplexer.

    Abstract translation: 公开了一种在硬件结构中对元件进行排序的方法。 该方法包括从预定范围内的无序输入队列(UIQ)中选择多个元素以响应于找到预定范围的至少一个最高有效位与相关标识符的相应位的匹配 多个元素。 该方法还包括将多个元件中的每一个呈现给相应的多路复用器。 此外,所述方法包括:响应于找到与所述多个元素中的每一个相关联的相应标识符的至少一个最低有效位与所述有序队列的端口号之间的匹配来产生用于使能多路复用器的选择信号。 最后,该方法包括将与所选择的元素标识符相关联的分组从所启用的多路复用器转发到有序队列的匹配端口号。

    METHOD AND APPARATUS FOR GUEST RETURN ADDRESS STACK EMULATION SUPPORTING SPECULATION
    4.
    发明申请
    METHOD AND APPARATUS FOR GUEST RETURN ADDRESS STACK EMULATION SUPPORTING SPECULATION 审中-公开
    用于返回地址堆栈模拟支持参数的方法和装置

    公开(公告)号:WO2014151691A1

    公开(公告)日:2014-09-25

    申请号:PCT/US2014/026252

    申请日:2014-03-13

    CPC classification number: G06F9/3806 G06F9/3017

    Abstract: A microprocessor implemented method for maintaining a guest return address stack in an out-of- order microprocessor pipeline is disclosed. The method comprises mapping a plurality of instructions in a guest address space into a corresponding plurality of instructions in a native address space. For each function call instruction in the native address space fetched during execution, the method also comprises performing the following: (a) pushing a current entry into a guest return address stack (GRAS) responsive to a function call, wherein the GRAS is maintained at the fetch stage of the pipeline, and wherein the current entry comprises information regarding both a guest target return address and a corresponding native target return address associated with the function call; (b) popping the current entry from the GRAS in response to processing a return instruction; and (c) fetching instructions from the native target return address in the current entry after the popping from the GRAS.

    Abstract translation: 公开了一种用于在无序微处理器管线中维护客户返回地址堆栈的微处理器实现的方法。 该方法包括将访客地址空间中的多个指令映射到本机地址空间中的对应的多个指令中。 对于在执行期间获取的本地地址空间中的每个功能调用指令,该方法还包括执行以下操作:(a)响应于函数调用将当前条目推送到客户返回地址堆栈(GRAS),其中GRAS保持在 所述流水线的获取阶段,并且其中当前条目包括关于访客目标返回地址和与所述功能调用相关联的对应本地目标返回地址两者的信息; (b)响应处理退货指示,从GRAS弹出当前条目; 和(c)从GRAS弹出后从当前条目中的本地目标返回地址获取指令。

    A VIRTUAL LOAD STORE QUEUE HAVING A DYNAMIC DISPATCH WINDOW WITH A UNIFIED STRUCTURE

    公开(公告)号:WO2013188705A3

    公开(公告)日:2013-12-19

    申请号:PCT/US2013/045734

    申请日:2013-06-13

    Abstract: An out of order processor. The processor includes a virtual load store queue for allocating a plurality of loads and a plurality of stores, wherein more loads and more stores can be accommodated beyond an actual physical size of the load store queue of the processor; wherein the processor allocates other instructions besides loads and stores beyond the actual physical size limitation of the load/store queue; and wherein the other instructions can be dispatched and executed even though intervening loads or stores do not have spaces in the load store queue.

    AN INSTRUCTION DEFINITION TO IMPLEMENT LOAD STORE REORDERING AND OPTIMIZATION

    公开(公告)号:WO2013188696A3

    公开(公告)日:2013-12-19

    申请号:PCT/US2013/045722

    申请日:2013-06-13

    Abstract: A method for forwarding data from the store instructions to a corresponding load instruction in an out of order processor. The method includes accessing an incoming sequence of instructions, and of said sequence of instructions, splitting store instructions into a store address instruction and a store data instruction, wherein the store address performs address calculation and fetch, and wherein the store data performs a load of register contents to a memory address. The method further includes, of said sequence of instructions, splitting load instructions into a load address instruction and a load data instruction, wherein the load address performs address calculation and fetch, and wherein the load data performs a load of memory address contents into a register, and reordering the store address and load address instructions earlier and further away from LD/SD the instruction sequence to enable earlier dispatch and execution of the loads and the stores.

    A VIRTUAL LOAD STORE QUEUE HAVING A DYNAMIC DISPATCH WINDOW WITH A DISTRIBUTED STRUCTURE
    7.
    发明申请
    A VIRTUAL LOAD STORE QUEUE HAVING A DYNAMIC DISPATCH WINDOW WITH A DISTRIBUTED STRUCTURE 审中-公开
    具有分布式结构的动态分配窗口的虚拟装载商店

    公开(公告)号:WO2013188460A2

    公开(公告)日:2013-12-19

    申请号:PCT/US2013/045261

    申请日:2013-06-11

    Abstract: An out of order processor. The processor includes a distributed load queue and a distributed store queue that maintain single program sequential semantics while allowing an out of order dispatch of loads and stores across a plurality of cores and memory fragments; wherein the processor allocates other instructions besides loads and stores beyond the actual physical size limitation of the load/store queue; and wherein the other instructions can be dispatched and executed even though intervening loads or stores do not have spaces in the load store queue.

    Abstract translation: 一个乱序处理器。 处理器包括分布式负载队列和分布式存储队列,其维护单个程序顺序语义,同时允许跨多个核心和存储器片段的负载和存储的乱序分派; 其中所述处理器除了加载和存储之外分配超出所述加载/存储队列的实际物理大小限制的其他指令; 并且其中即使中间加载或存储在加载存储队列中没有空格,也可以调度和执行其他指令。

    A METHOD AND SYSTEM FOR FILTERING THE STORES TO PREVENT ALL STORES FROM HAVING TO SNOOP CHECK AGAINST ALL WORDS OF A CACHE
    8.
    发明申请
    A METHOD AND SYSTEM FOR FILTERING THE STORES TO PREVENT ALL STORES FROM HAVING TO SNOOP CHECK AGAINST ALL WORDS OF A CACHE 审中-公开
    一种用于过滤存储的方法和系统,以防止所有存储器对所有字段的高速缓存进行SNOOP检查

    公开(公告)号:WO2013188414A2

    公开(公告)日:2013-12-19

    申请号:PCT/US2013/045193

    申请日:2013-06-11

    Abstract: In a processor, a method for filtering stores to prevent all stores from having to snoop check against all words of a cache. The method includes implementing a cache wherein stores snoop the caches for address matches to maintain coherency; marking a portion of a cache line if a given core out of a plurality of cores loads from that portion by using an access mask; checking the access mask upon execution of subsequent stores to the cache line; and causing a miss prediction when a subsequent store to the portion of the cache line sees a prior mark from a load in the access mask.

    Abstract translation: 在处理器中,用于过滤存储的方法,以防止所有商店必须窥探高速缓存的所有字。 该方法包括实现高速缓存,其中存储窥探用于地址匹配的高速缓存以维持一致性; 如果多个核心中的给定核心通过使用访问掩码从该部分加载,则标记高速缓存行的一部分; 在将后续存储执行到高速缓存行时检查访问掩码; 并且当对高速缓存行的部分的后续存储看到来自访问掩码中的加载的先前标记时,引起未命中预测。

    A LOAD STORE BUFFER AGNOSTIC TO THREADS IMPLEMENTING FORWARDING FROM DIFFERENT THREADS BASED ON STORE SENIORITY
    9.
    发明申请
    A LOAD STORE BUFFER AGNOSTIC TO THREADS IMPLEMENTING FORWARDING FROM DIFFERENT THREADS BASED ON STORE SENIORITY 审中-公开
    基于存储优先级的负载存储缓冲器对不同螺纹的螺纹执行螺纹

    公开(公告)号:WO2013188311A1

    公开(公告)日:2013-12-19

    申请号:PCT/US2013/045020

    申请日:2013-06-10

    Abstract: In a processor, a thread agnostic unified store queue and a unified load queue method for out of order loads in a memory consistency model using shared memory resources. The method includes implementing a memory resource that can be accessed by a plurality of asynchronous cores, wherein the plurality of cores share a unified store queue and a unified load queue; and implementing an access mask that functions by tracking which words of a cache line are accessed via a load, wherein the cache line includes the memory resource, wherein the load sets a mask bit within the access mask when accessing a word of the cache line, and wherein the mask bit blocks accesses from other loads from a plurality of cores. The method further includes checking the access mask upon execution of subsequent stores from the plurality of cores to the cache line, wherein stores from different threads can forward to loads of different threads while still maintaining in order memory consistency semantics; and causing a miss prediction when a subsequent store to the portion of the cache line sees a prior mark from a load in the access mask, wherein the subsequent store will signal a load queue entry corresponding to that load by using a tracker register and a thread ID register.

    Abstract translation: 在处理器中,线程不可知统一存储队列和统一的加载队列方法,用于使用共享内存资源在内存一致性模型中进行乱序加载。 该方法包括实现可由多个异步核心访问的存储器资源,其中所述多个核共享统一存储队列和统一加载队列; 以及实现通过跟踪通过负载访问高速缓存行的哪些字的功能的访问掩码,其中所述高速缓存行包括所述存储器资源,其中当访问所述高速缓存行的字时,所述负载设置所述访问掩码内的掩码位, 并且其中所述掩码位阻止来自多个核的其它负载的访问。 该方法还包括:在执行从多个核到高速缓存行的后续存储时检查访问掩码,其中来自不同线程的存储可以转发到不同线程的负载,同时仍然按照存储器一致性语义进行维护; 并且当对所述高速缓存行的所述部分的后续存储看到来自所述访问掩码中的负载的先前标记时,导致未命中预测,其中所述后续存储将通过使用跟踪器寄存器和线程来发出与所述负载相对应的加载队列条目 ID寄存器。

    CACHE REPLACEMENT POLICY
    10.
    发明申请

    公开(公告)号:WO2013089786A8

    公开(公告)日:2013-06-20

    申请号:PCT/US2011/065584

    申请日:2011-12-16

    Abstract: Cache replacement policy. In accordance with a first embodiment of the present invention, an apparatus comprises a queue memory structure configured to queue cache requests that miss a second cache after missing a first cache. The apparatus comprises additional memory associated with the queue memory structure is configured to record an evict way of the cache requests for the cache. The apparatus may be further configured to lock the evict way recorded in the additional memory, for example, to prevent reuse of the evict way. The apparatus may be further configured to unlock the evict way responsive to a fill from the second cache to the cache. The additional memory may be a component of a higher level cache.

Patent Agency Ranking